Font Size: a A A

Research And Realization Of BWT-based Contig Construction Algorithm Oriented To Paring Sequencing

Posted on:2017-05-02Degree:MasterType:Thesis
Country:ChinaCandidate:K WangFull Text:PDF
GTID:2180330509957495Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Since the completion of the human genome project, people come to realize the importance of a complete biological gene sequence to explore of the nature of life,which promotes the rapid development of bioinformatics. With the development of next-generation sequencing technology, the existing sequencing data not only has the characteristics of high-throughput, high error rate, but also has longer read and pairing information. Existing software is lack of use of the new information in the process of assembling sequences. Thus, the development of a new sequence assembly software oriented to current characteristics of sequencing data becomes an urgent requirement in bioinformatics.Using de novo sequencing technology to obtain gene sequences requires two processes known as contigs generation and scaffold assembly. In this article, the contigs generation in the de novo sequencing progress is discussed. The quality of contigs will directly affect the outcome of the final sequence. Thus, the research of contig generation algorithm is of great significance.This paper presents a contig generation algorithm with fuzzy match. First, the algorithm uses the BWT index to find effective collection of overlapping read. Then,read clustering will be used to find the optimal k-mer in the area and tree search strategy will be used to form read template with continuous k-mer to complete contig extension.Finally, make full use of pairing information to ensuare the contigs’ quality. The extended read in this method is able to avoid sequencing errors to a certain extent and is able to make full use of data sequencing information.At last, the method in this paper will be used to compare with BWT-based greedy algorithm and De Bruijn graphs-based SOAPdenobo2. Results of experiments show that compared to the existing greedy algorithm, the method in this paper improves the assembling results and efficiency, at the same time, compared to SOAPdenovo2, the method in this paper has lower memory footprint and generates contigs with higher confidence which will provide a more reliable input information for subsequent gene assembly.
Keywords/Search Tags:de novo, contig, paired-end data, fuzzy matching, BWT
PDF Full Text Request
Related items