Font Size: a A A

Algorithm Research Of DNA De Novo Assembly Based On Mate-Pair

Posted on:2012-03-08Degree:MasterType:Thesis
Country:ChinaCandidate:W ZhaoFull Text:PDF
GTID:2210330362951437Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As entering into the 21st century, the emergence of the next-generation sequencing platforms led to resurgence the research in the whole-genome assembly algorithms and software. DNA sequencing data from the Roche 454, Illumina Solexa, and ABI SOLiD platforms which represent the next-generation sequencing platforms typically present short read lengths, high error rate and other shortcomings. These shortcomings lead to the traditional genome assembly algorithms and software no longer applicable. However, due to its high throughput, low cost, and especially its success in the de novo sequencing of the bacteria which great encourage people's passion of the research to the next-generation sequencing technology. Scaffolding algorithms research is an important part of the whole-genome assembly algorithms research, so developing new independent scaffolding algorithm that can run on personal computer is very necessary.The whole-genome assembly algorithms and software which designing for the next-generation data can divided into two major parts, the first part is the process of assembling the DNA fragment that was produced by the next-generation sequencing platforms to contig, which we called assembly progress. The second part is the process of assembling contig that was produced by the assembly progress to scaffold, which we called scaffolding progress. This paper researched the whole-genome de novo assembly algorithm, in order to complete the algorithm, this paper proposed the rapid choose effectively mate-pair algorithm and assembly contig algorithm. This paper got all mate-pair that existed on the contig by the process of construction of mate-pair library. Designed two mapping structure to obtain the association between any two contig. Proposed a unique data structure to save the mate-pair number between any two contig, final completed the design and implementation of the de novo assembly algorithm.This paper proposed a new algorithm that can choose the mate-pair fast and effective and contig assemble algorithms, which can run on the personal computers. The algorithms saved a lot of memory by the unique design of data structures, and at the same time, improved the speed of the algorithm. Full used of the biological characteristics of the sequencing data, assembled about 72% of the contig without considering the overlap between contig.
Keywords/Search Tags:DNA sequence, De novo sequencing, Whole-genome assembly, Mate-pair
PDF Full Text Request
Related items