Font Size: a A A

Research On De Bruijin Graph For DNA Sequence Assembly

Posted on:2012-10-09Degree:MasterType:Thesis
Country:ChinaCandidate:D Y WangFull Text:PDF
GTID:2210330362450458Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Bioinformatics is a comprehensive science field about the acquisition, storage, analysis and other aspects of biological data, and plays an important role in life science. DNA sequencing is one of the most basic directions of bioinformatics research. However, most genomes are not a one-time gain. So we need to use DNA assembly technique to splice the fragment obtained in experiments.Currently, there are mainly two strategies to assembly DNA fragments: The Hamilton Path algorithm and the Euler Path algorithm. The Hamilton Path algorithm will cause NP-complete problems, with high complexity. The Euler Path algorithm transformed the DNA fragment assembly problem into a problem to find Euler path in the de Bruijn graph, and it is a linear time algorithm, but need more storage than the Hamilton Path algorithm.With the development of DNA sequencing technology, the fragments obtained in experiments become shorter. The Euler Path algorithm has more advantages to deal with these shorter fragments, and is an important direction of DNA assembly.The construction of de Bruijn graph is a key step of the Euler Path algorithm.But the way of construction of de Bruijn graph has never changed. We always make k-1 base pair overlap between two k-mer. But the study of this paper finds that if we make less than k-2 base pair overlap between two k-mer. The construction of de Bruijn graph will be changed strongly. In this paper, we will make a detailed analysis of these effects. And we design a inquiry system which provide information about the relation between the number of dislocation between two k-mer and the construction of de Bruijn graph. The running results show that the dislocation of k-mers will significantly affect the construction of de Bruijn graph. If the existing algorithms can consider the effect, and chose a suitable dislocation to make a simpler de Bruijn graph, there may be better assembly results.
Keywords/Search Tags:DNA assembly, Euler path, de Bruijn graph, information system
PDF Full Text Request
Related items