Font Size: a A A

Research On The Diploid Haplotype Reconstruction Algorithm Based On The Enumeration Strategy

Posted on:2017-04-28Degree:MasterType:Thesis
Country:ChinaCandidate:Q Q ChenFull Text:PDF
GTID:2180330488975446Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
SNP analysis plays an important role in exploring the genetic relationship of biological population and analyzing disease association. But compared with a single SNP, a haplotype contains more abundant genetic information, and plays a key role in gene related studies and medical treatments. However, it is very expensive to obtain haplotype by using experiment method. The omputional method used to acquire haplotypes, i.e.,haplotype reconstruction problem, has been presented and received widespread attention. In this paper, the diploid haplotype reconstruction problem is studied, the concrete work is as follows:The minimum error correction model is used to study the haplotype reconstruction problem, A reconstruction algorithm EHDMS (Enumeration Haplotyping Diploid with More Support), which selects an enumeration value based on support degree, is presented. EHDMS algorithm reconstructs the SNP sites of a pair of haplotypes one after another. It enumerates two kinds of SNP values, for the SNP site being reconstructed, and chooses the one with more support coming from the SNP fragments that are covering the corresponding SNP site. The CEPH haplotype samples released from HapMap are used in the experiments. CELSIM and MetaSim, two kinds of sequencing fragments simulation generator, are adoped to simulate test data. The reconstruction rate and running tiem of the EHDMS,the FAHR, the Fast Hare and the DGS algorithms are compared under different coverages, error rates, the single fragment lengths and the haplotype lengths. Experimental results indcate that the EHDMS algorithm can get higher reconstruction rate than the other three algorithms in most cases with high efficiency.A reconstruction algorithm EHDLD (Enumeration Haplotyping Diploid with Least Difference), which selects an enumeration value based on difference degree, is presented for solving the minimum error correction modle. When enumerating two kinds of SNP values, the sum of distances between haplotypes and fragments covering the site are calculated respectively for the two kinds of situations, which covers the site, and select the corresponding to the least difference degree of value. The SNP value corresponding to the smaller distance sum is chosen. Experimental results show that EHDMS and EHDLD algorithms have similar performance, and in most cases they can obtain higher reconstruction rate than algorithms FAHR, Fast Hare and DGS.In conclusion, this paper presents individual haplotype reconstruction algorithms EHDMS and EHDLD based on enumeration strategy. The experimental results show that the two algorithms can obtain very high reconstruction rate with high efficiency. They are effective methods for reconstructing diploid haplotypes.
Keywords/Search Tags:Single nucleotide polymorphism, Haplotype, MEC, Enumeration, Reconstruction
PDF Full Text Request
Related items