Font Size: a A A

Research On Haplotype Assembly Algorithm Based On Graph

Posted on:2016-06-11Degree:MasterType:Thesis
Country:ChinaCandidate:F F TaoFull Text:PDF
GTID:2180330461977084Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Genome-wide haplotype reconstruction from sequence data or haplotype assembly faces enormous challenges in molecular biology and life sciences. For complex eukary-otic organ-isms like humans, the genome is vast and the population samples are grow-ing so rapidly that people have to use high-throughput sequencing technology for gene sequencing, the accuracy and computational efficiency thus could be ensured with the help of advance computer technology. Standard genome sequencing workflows produce contiguous DNA segments that containing a single nucleotide polymorphisms (SNP) spot of an unknown chromosomal origin. By mapping the sequencing reads to the ref-erence sequence, the origin can be determined. Sequence reads are valuable when they contain two or more SNP phasings. The haplotype assembly problem aims to compute the haplotype sequences for each given chromosome’s mutation.In this work, we present a novel Resolving Conflict Graph Algorithm for haplotype assembly of densely sequenced human genome data. Our algorithm could produce a weighted and undirected graph through the fragments and the relation between SNP phasings, i.e. a graph model containing single nucleotide polymorphisms (SNP) con-flicting cycles. The nods of the graph corresponding to single nucleotide polymor-phisms (SNP) phasings and edges are the fragments that supporting the corresponding two SNP phasings; and the edge weight was defined as the sum of weights of the frag-ments that supporting the SNP phasings. Our algorithm could determine the conflict circle formed by sequencing errors. By resolving conflict cycles, the SNP phasings could be adjusted, then, the haploid sequence could be outputted finally. Through com-paring with the other algorithms that corresponding to three different models, our algo-rithm has certain advantages in reducing the error rate of the fragment.
Keywords/Search Tags:Haplotype assembly, single nucleotide polymorphism(SNP), maxi- mum spanning tree, Conflicting Graph, disjoint set
PDF Full Text Request
Related items