Font Size: a A A

Research On Tag SNP Selection Method Based On Haplotype Identification

Posted on:2014-01-11Degree:MasterType:Thesis
Country:ChinaCandidate:X D LiFull Text:PDF
GTID:2370330488499519Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Single nucleotide polymorphism refers to the polymorphism of DNA sequences due to a single nucleotide variation in genome level.A subset of SNPs(called tag SNPs)is sufficient for capturing most of information of hyplotype in the association study based on hyplotype.At present,many methods are used to tag SNP selection.However,these methods still exist drawbacks mainly in these aspects:high time complexity and high compact degree of label SNP subset,which lead to the high haplotyping cost in the following association studies.Therefore,we aim at the above problems and put forward a tag SNP selection framework based on ant colony algorithm in this paper.According to the inconsistent demand of different experimental platform,this paper puts forward a tag SNP selection method based on ant colony algorithm.it can adaptively select the optimal combination of label SNP according to the coverage rate set by user.The main work is as follows:As SNP data has the characteristics of high dimension and small number of samples with large amount of SNP sites,tag SNP selection method based on haplotype reconstruction consumes much time to predict the non-informative SNPs.However,if we use combination of samples covering method to select label SNP according to the coverage of small samples which has nothing to do with the non label SNP site,it can thereby greatly reduce the time complexity.What’s more,experimental platforms with different characteristics such as error rate,demand different sample fraction of coverage.this paper proposes an adaptive selection method to select the optimal label SNP combination under the condition of different coverage ratios.In order to reduce the time complexity of searching combination space of SNP,this paper has used the average strategy to estimate missing data in the original data set and then preliminary screened for SNP data according to complete linkage and MAF properties of SNP,which help to eliminate some reliable redundant sites.In order to improve the efficiency of the search space and obtain the minimum tag SNP subset,we have improved ant colony algorithm used to tag SNP subset selection.We encode the SNP data at first according to the features of SNP data and the combined coverage problem,then design an fitness function in the light of the coverage rate set by user.At Last,we design an operator to choose the path in ant colony algorithm and improve the heuristic function to find the optimal combination of label SNP subset.Finally,in order to verify the validity of our method,we make an comparison by running several current popular methods on multiple HAPMAP data sets.The experimental results show that our method has certain advantages in time complexity and compactness of tag SNP subset.
Keywords/Search Tags:Single Nucleotide Polymorphism, tag SNPs, ant colony algorithm
PDF Full Text Request
Related items