Font Size: a A A

Algorithmic Strategies And Experimental Comparisons For The Single Individual Haplotype Assembly Problem By Minimum Error Correction

Posted on:2011-07-12Degree:MasterType:Thesis
Country:ChinaCandidate:S Y ChenFull Text:PDF
GTID:2120330332488153Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Haplotyping plays an important role in locating complex disease susceptibility genes. The cost of time and money is too expensive to detect the individual haplotype under the current experiment technique, so using computer to confirm the individual haplotype has great significance. The haplotype assembly problem is a computational problem that, given a set of DNA sequence fragment data of an individual, induces the corresponding haplotypes. Most of the models for the problem have been proven to be NP-hard, and there are not practical algorithms for them.There are several models for the problem, of which we are interested in the Minimum Error Correction (MEC) model. Because of the MEC model is the most practical and least loss of information model. This paper emphasizes on the use of MCMC method for haplotype assembly problem by MEC model firstly. The MCMC algorithm for assembling haplotypes from sequenced DNA fragments uses a Markov Chain formed from construction haplotyes. Subsequently, we reference the innovative thinking of variation structure graph, and then propose a heuristic method for the MEC problem based on the genetic algorithm called GAMEC. We design the fitness function and heuristic operators to selection, crossover and mutation in populations formed by the haplotypes. We have programmed and applied our methods to assembly haplotypes using simulation data and whole-genome shotgun sequence data of 19 chromosome from a human individual. Numerical experiments show that the two algorithms have very good performance. In almost all cases, it can find optimal solutions. The MCMC algorithm is more scalable and applicable in practice, and the GAEMC algorithm can get the accurate result quickly. The two algorithms represent a general framework for haplotype assembly that can be applied to sequence data generated by othe sequencing technologies.
Keywords/Search Tags:Haplotype Assembly, MEC Model, MCMC Algorithm, Genetic Algorithm
PDF Full Text Request
Related items