Font Size: a A A

Design And Implementation Of Fast DNA Alignment System Based On Burrows-wheeler Transformation

Posted on:2015-11-25Degree:MasterType:Thesis
Country:ChinaCandidate:Y D ZhouFull Text:PDF
GTID:2180330422492277Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of the next-generation sequencing, sequencing data of DNA accumulates at a surprisingly fast speed, which is larger than the speed of data proceeded. Sequence alignment is a fundamental part of bioinformatics research, which is the first step of gene assembly and a process of matching reads to reference genomes. At present, due to the difference of methods of building index, aligning software can be divided into two kinds. One kind of aligning software such as SOAP, MAQ and so on, construct index based on hash-table and the other kind of aligning software such as BWA, Bowtie, SOAP2and so on, build index based on BWT. Sequencing data’s quick accumulation makes a challenge for sequence aligning software in the respect of aligning speed and accuracy. What was researched in this paper is sequence aligning algorithm based onBWT. Low cache hit rate is a major issue that BWT based aligners are suffering from. To solve this problem, in this manuscript, we introduced a new strategy in which we sorted the reads before performing alignments. We sorted reads lexicographically from right to left, so after sorting reads with the same suffixes would get together. Therefor when aligning next read we could use the intermediate result which were saved when aligning last read. Our statistical analysis showed that the sharing suffix length could be as long as18bp for large number of36bp short reads. It meant that we could reduce18times of unnecessary looking ups for each read. In this way we could decrease the visiting times to variables which were at a low hit radio in the cache and improve aligning speed. Except for exact matching, inexact matching allowing for two mismatches orgaps has also been done. We designed a novel method. Those inexactly matched reads were modified in error handling, then sorted and lastly exactly matched. In this process we constructed a filtering index which largely reduces the number of edited reads and saves the time consumption. To identify whether the aligning algorithm based on sorting reads was effective,we did experiments on simulative and real data, which proved our method indeed could raise the speed of alignment. Besides, our method was also compared with other three popular alignment software and we could draw a conclusion that our method presented in this paper has obvious advantage on large scales of reads, and it can obtain high value on aligning speed and accuracy in exact matching and inexact matching processes, and especially in exact matching our software has good performance.
Keywords/Search Tags:next-generation sequencing technology, sequence alignment, BWT, DNA
PDF Full Text Request
Related items