Font Size: a A A

Second-generation Sequencing Technology Based Short Reads Assembly System

Posted on:2012-08-10Degree:MasterType:Thesis
Country:ChinaCandidate:M H SunFull Text:PDF
GTID:2120330335950037Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The high throughput and low cost of second generation sequencing technologies have revolutionized the methodology for biological research. Since 1977, Sanger sequencing technologies are born, more than 1000 bacteria and 100 eukaryotic genomes, including humans were sequenced. Sequences assembly is the key step in bioinformatics analysis after genome sequencing. High quality assembled sequences are of the foundation and guarantee in bioinformatics analysis. With second generation sequencing technology development, the sequences assembly algorithm experienced a revolutionary change. In recent years, many sequences assembly algorithms were published based on next-generation sequencing technologies. Despite this, the constant developments of sequencing technologies are still need continuous progress in assembly algorithms.The second generation sequencing technologies have characteristics of short read-length and high throughput. The traditional sequence assembly algorithms are difficult to implement of short reads assembly. People have exploited many short reads assembly algorithms and tools based on short reads from second generation technologies, including reference assembly and de novo assembly. But each strategy has its own limitations, over-reliance on the existing sequences for reference assembly, and the demand of high sequencing depth for de novo assembly. How to combine the advantages of the two to make up for the limitations of their own assembly will be a problem to face in the future.This paper studied and developed a combinatorial pipeline for improved genome sequences assembling using Solexa short reads. Combinatorial assembly strategy can both use the existing sequence information and mine new sequence information. There is a large amount of sequencing information can be used as reference sequence in the databases. But there is a lot of sequence information to be exploited. Therefore, de novo assembly is necessary. By using the combinatorial assembly strategy, we can not only overcome the difficulties in the low coverage region and the bias in the sequencing process, but also overcome the over-reliance on the existing sequences in the reference assembly. The combinatorial assembly strategy might become a new trend in the near future. As the result of tested on the three datasets, the assembly of the combinatorial pipeline was better than other existing assembly tools based on a single assembly strategy and provided a better basis in the sequences for experimental biology and subsequent bioinformatics analysis.
Keywords/Search Tags:Sequencing, Assembly, Reference, De novo assembly, Hybrid assembly, Short reads
PDF Full Text Request
Related items