Font Size: a A A

Research On Polyploid Haplotype Assembly Algorithms

Posted on:2020-12-24Degree:MasterType:Thesis
Country:ChinaCandidate:X YuFull Text:PDF
GTID:2370330590985975Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Diseases known to humans are directly or indirectly related to genes.Studying the differences in gene sequences among different individuals plays an important role in understanding human genetics and preventing diseases.SNPs are single nucleotide polymorphisms,mainly referring to DNA sequence polymorphisms caused by variations in single nucleotides at the genomic level.A SNP sequence in one region that tends to be inherited to the offspring in its entirety is referred to as a haplotype.However,due to the limitation of sequencing technology,it is very difficult to obtain a complete haplotype sequence directly by sequencing.Therefore,how to assemble the fragment obtained by sequencing into a haplotype becomes a new difficulty.The existing haplotype assembly problems are roughly based on the optimization principles of MSR,MFR,MEC,etc.Most of these problems are NP-hard,and the assembly of polyploid haplotypes is due to its complicated typing.Lack of effective practical algorithms.Reconstruction of multiple haplotypes of the polyploid genome from sequencing fragments has become feasible due to the reduced cost of next-generation sequencing technology and the increase in fragment length.This paper presents two Qhap and QChap algorithms for polyploid haplotype assembly algorithms under next-generationsequencing technology.Both algorithms are based on improvements in the MEC algorithm.The Qhap algorithm greatly reduces the time complexity by limiting the maximum number of inversions in each column of the SNP matrix.At the same time,the confidence analysis is introduced,so that the obtained haplotype is more in line with the real situation.For fragments obtained from the k-ploid genome sequencing,the algorithm attempts to divide the fragments into k groups such that the sum of the confidence scores of the flipping sites is the lowest.The QChap algorithm is based on the Qhap algorithm,and the maximum number of flips per column is improved from a fixed value to a value that is dynamically adjusted as the sequencing error rate and the coverage of each column change.A large number of experimental tests on simulated and real data show that Qhap and QChap algorithms can effectively solve the problem of polyploid haplotype assembly and are faster and more accurate than the recent polyploid haplotype assembly algorithm.
Keywords/Search Tags:MEC model, polyploid haplotype assembly, confidence, Maximum number of flips
PDF Full Text Request
Related items