Font Size: a A A

Nucleotide Sequence Alignment Based On Sparse Indexing Algorithm

Posted on:2016-07-17Degree:MasterType:Thesis
Country:ChinaCandidate:L S LiFull Text:PDF
GTID:2180330464456906Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
Since the next generation sequencing technologies(NGS) become the main method of DNA sequencing, researchers are able to get lots of low-cost bases of DNA rapidly. At this moment, it has become new challenges that how to extract useful information to research in this massive data. On one hand, there is a growing number of species which get their genomic DNA sequence. On the other hand it is lagging behind in analysis of the large datasets. DNA sequencing technologies has not really led to the all-round development of biotechnology. Thus, it is imperative to develop better bioinformatics software. The significant discoveries of biotechnology may allow better benefit of mankind.As more and more genomes have been sequenced, we usually need to map large scale short-read to multiple genomes. However, there are significant similarities between homologous genomic sequences. It leads to the traditional methods waste a lot of time which aligns short reads to the genomes respectively. In this paper, we proposed a sparse indexing algorithm, which achieves rapid alignment by aligned specific bases to map two sequences. With this algorithm, we can merge multiple genomes into a non-redundant representation within a short time by extract similar nucleotide sequences quickly. Thus, mapping short read to the non-redundant sequence would reduce time greatly. Meanwhile, the ratio of the alignment between short read and genome with the sparse index is more efficient than the traditional method.In the first part, the paper introduces the background and current status of the research on DNA sequence alignment. It includes the development of sequencing technology, efficient sequence alignment tools, and the principle of the BWT algorithm. Then, the second part details the sparse indexing algorithm, including its principle and implementation of sequence alignment. It discusses the advantages and disadvantages via comparing this algorithm to BWT method. Subsequently there are two applications based on the algorithm in the next two items. The first one is mapping between genomes that used for species identification. The other one is fast alignment of short read that used for reference-based compression. With the help of comparing the experimental results, we can find that the method based on sparse indexing algorithm occupying less memory and owning a faster speed than the traditional methods. Finally, the paper talks about the development of sequencing technology and the growing accumulation of DNA sequences. And this requests the alignment algorithm which should be able to process mass data. Therefore, the parallel application is the future development of sparse indexing algorithm.
Keywords/Search Tags:Sparse Index, Alignment between Genomes, Comp Map tool, Optimal mapping for read, Read Best Map tool
PDF Full Text Request
Related items