Design And Implementation Of Fast DNA Alignment System Based On Burrows-wheeler Transformation

Posted on:2015-11-25

Degree:Master

Type:Thesis

Country:China

Candidate:Y D Zhou

Full Text:PDF

GTID:2180330422492277

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the development of the next-generation sequencing, sequencing data of DNA accumulates at a surprisingly fast speed, which is larger than the speed of data proceeded. Sequence alignment is a fundamental part of bioinformatics research, which is the first step of gene assembly and a process of matching reads to reference genomes. At present, due to the difference of methods of building index, aligning software can be divided into two kinds. One kind of aligning software such as SOAP, MAQ and so on, construct index based on hash-table and the other kind of aligning software such as BWA, Bowtie, SOAP2and so on, build index based on BWT. Sequencing data’s quick accumulation makes a challenge for sequence aligning software in the respect of aligning speed and accuracy. What was researched in this paper is sequence aligning algorithm based onBWT. Low cache hit rate is a major issue that BWT based aligners are suffering from. To solve this problem, in this manuscript, we introduced a new strategy in which we sorted the reads before performing alignments. We sorted reads lexicographically from right to left, so after sorting reads with the same suffixes would get together. Therefor when aligning next read we could use the intermediate result which were saved when aligning last read. Our statistical analysis showed that the sharing suffix length could be as long as18bp for large number of36bp short reads. It meant that we could reduce18times of unnecessary looking ups for each read. In this way we could decrease the visiting times to variables which were at a low hit radio in the cache and improve aligning speed. Except for exact matching, inexact matching allowing for two mismatches orgaps has also been done. We designed a novel method. Those inexactly matched reads were modified in error handling, then sorted and lastly exactly matched. In this process we constructed a filtering index which largely reduces the number of edited reads and saves the time consumption. To identify whether the aligning algorithm based on sorting reads was effective,we did experiments on simulative and real data, which proved our method indeed could raise the speed of alignment. Besides, our method was also compared with other three popular alignment software and we could draw a conclusion that our method presented in this paper has obvious advantage on large scales of reads, and it can obtain high value on aligning speed and accuracy in exact matching and inexact matching processes, and especially in exact matching our software has good performance.

Keywords/Search Tags:

next-generation sequencing technology, sequence alignment, BWT, DNA

PDF Full Text Request

Related items

1	Research On Sequence Alignment Methods For The Third-generation Sequencing Data
2	Design And Implementation Of Fast DNA Alignment System Based On Burrows-wheeler Transformation
3	Research On GPU-based Sequence Aligner Towards Next-generation Sequencing
4	A Sequence Alignment Algorithm With Combining Variants Data
5	Researches Of Short Sequence Alignment And Scaffold Algorithm Based On Next Generation Sequencing
6	A Design Of Short Gene Sequence Alignment Acceleration System Based On High Performance Hash Table
7	Research On DNA Sequence Fast Alignment Based On Spaced Burrows-Wheeler Transform
8	Algorithms Of Aligning The Third-Generation Sequencing Sequences And Picking The Operational Taxonomic Units
9	Genomic Structural Variants Fragments Detection Algorithm Based On Sequence Alignment Skeleton
10	Research On Calling Methods Of Structural Variation Based On Third Generation Sequencing Data