Font Size: a A A

Algorithm Research Of DNA Contig Merger Based On BWT

Posted on:2013-04-26Degree:MasterType:Thesis
Country:ChinaCandidate:F YangFull Text:PDF
GTID:2250330392969499Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Since1977, the development of the genome sequencing technology encouragespeople to study molecular genetic research. The rapid development of molecularbiology and the next-generation sequencing technology make tremendous changes inthe development of the genome molecular biology. Along with the third-generationsequencing technology developping, people can easily get a large number of genomesequencing data. Unlike the first-generation sequencing technology to generate longfragments, the next-generation sequencing technology produces short read fragments,high error rate and other shortcomings. At the same time, due to its high throughput andlow cost, it encourages people’s passion of the research on the genome assemblyalgorithms. Due to the revolutionary development of the genome sequencing technology,there are new challenges in developping the genome assembly algorithms.The paper studies the DNA contig merging algorithm, which is an important part ofthe whole-genome assembly algorithms. In most researches, contig merging follows theprocess of assembling the DNA fragment. It is necessary and worth to propose anindepent contig merging algorithm.The paper presents a novel algorithm of contig merging, which is based on theBWT method, building the index structure of the reference sequence of the DNA contig.The search process of the position between the mate-pair and the DNA contig istransformed into the sequence matching of the BWT index. That method can improvethe time efficiency of processing vast amounts of sequencing data. To reduce the shareof the memory, the BWT index structure is sampling saved. In the experiment, theinformation of the position between mate-pair and contig is saved into the data structure;comparing the relations among contig, the most relevant contig can be found andmerged into a long base sequence. Finally, output the result of contig merging. At thesame time, with the consideration of positional relationship between adjacent contig, itis necessary to improve the merging result by amending the overlap and filling the gap.At the last, output the contig merging result sequences.The paper proposes a contig merging algorithm, which is an independent processof the whole-genome assembly algorithms. The memory usage is reduced and thealgorithm speed is improved, and the contig sequences are merged, using the BWTstructure and its fast sequence matching method. Finally, improve the contig merging result. In the end, there are84%of the total contig successfully merged, then output thescaffold sequence.
Keywords/Search Tags:DNA contig, BWT, Mate-pair, Overlap, Gap
PDF Full Text Request
Related items