Font Size: a A A

Research On Algorithms Of Reconstructing K Haplotypes

Posted on:2015-02-12Degree:MasterType:Thesis
Country:ChinaCandidate:Z C WangFull Text:PDF
GTID:2250330431957569Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of next-generation gene sequencing technology and the continuously widely employment of the haplotype data in areas such as human genetics and so on. Research on haplotype data began to turn to other species. Due to limitations in sequencing technology, haplotyping directly by means of biological experiments are much too expensive. Therefore, the study of haplotype assembly technology based on data of DNA fragments is necessary for the widely promotion of these applications. Since in many other species, the number of haplotypes are greater than2, Existing algorithms for solving Diploid haplotype assembly problem cannot be used, thus the study of K-haplotype assembly algorithm is with important scientific and practical significance. This paper mainly studies for K-haplotype assembly algorithm.This thesis introduces the background knowledge and significance of study of K haplotype assembly problems, describes the status quo and progress of this research. Problems of K-haplotype calculation can be divided into two cases that is the K value is known or unknown. In this thesis, the calculation methods of these two cases were studied individually, solutions based on genetic algorithm were proposed. The performance analysis on algorithms has been carried out through a large number of experiments. The followings are details.In the case that the K value is known, Problem of triploid individual haplotype reconstruction is studied. Based on the minimum error correction model, genetic algorithm GTIHR is proposed for triploid individual haplotype reconstruction. The algorithm uses a novel chromosome encoding method and effective genetic operators. The short chromosome encoding can construct a smaller solution space that makes the algorithm fast convergence to the optimal solution. In addition, the proposed genetic operators avoid premature by injecting random information to chromosomes, and in the optimization process, they effectively use the information of SNP fragments to revise the coding of the chromosomes. Since the real data of DNA fragments of is generally difficult to obtain, shotgun sequencing fragments generator CELSIM is used to generate fragment data in the experiments. The algorithm tests show that algorithm GTIHR can achieve higher haplotype reconstruction rate, the practical value is obvious.In the case that K value is unknown, haplotype reconstruction problem of viral quasi-species is studied. On the basis of "error corrected" fragments, a genetic algorithm GVQHR aiming at solving this problem is proposed. Effective chromosome coding process and mountain climbing operators has been designed aiming at the characteristics of the assembly of viral quasi-species. Chromosome coding by using length-variable character string collections, while mountain climbing operators firstly randomly remove some haplotypes from quasi-species, then rebuild a new quasi-species based on the remaining haplotypes and fragment collections. The algorithm uses HIV-1type viral genes for experimental test. Experimental results show that under various parameter settings, algorithm GVQHR can obtain better reconstruction results which provide values for work of further research.This thesis studied algorithm for K haplotype assembly and proposed methods which can better solve problems in reconstruction of triploid individual haplotypes and viral quasispecies haplotypes. These studies provide reference and effective basis for the research of haplotype data in other biological species.
Keywords/Search Tags:Haplotype, Triploid, Viral Quasispecies, Genetic Algorithm, Assembly
PDF Full Text Request
Related items