Font Size: a A A

Researches On Parallel Algorithms For DNA Sequence Alignment

Posted on:2016-03-27Degree:MasterType:Thesis
Country:ChinaCandidate:Q F XueFull Text:PDF
GTID:2180330479495443Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Bioinformatics is an interdisciplinary subject which uses information technology and computer science to study a large number of complex biological data. Currently, the DNA sequences in genomics is one of the most important research areas of bioinformatics. The basic method of studying DNA sequences is sequence alignment, it looks for the similarities and homology between sequences through the order of nucleotides, and analyses the genetic and evolution information of some species. In recent years, with the development of biology, the number of DNA sequences increases exponentially, which makes serial sequence alignment algorithms cannot meet the needs of the growing data sizes. This paper focuses on parallel algorithm of sequence alignment, proposes a new sequence encoded approach and “first sequence partition” method to improve the efficiency, which can be used to solve the problem of sequence alignment on large-scale data. The main innovations of this article include:Firstly, we propose a new DNA sequence encoded approach and implement FED parallel algorithm based on MPI. This paper analyzes and compares the DNA sequences exact alignment algorithm, found that along with the increasing of data, the algorithm computing time will increase significantly. To solve this problem, we propose a sequence encoded approach based on bit operation, in order to reduce data storage space and improve the algorithm. Then, we use MPI to implement the FED parallel algorithm. The experiment shows that the speedup of the parallel algorithm can achieve 16.1 on 20 cores.Secondly, we propose the “first sequence partition” method for CVoting parallel algorithm based on MPI. This paper studies Motif finding problem and challenging instances. CVoting is a representative algorithm for solving challenging instances, but it still needs more than 20 hours to solve(21,8). Therefore, we design three data partition methods and propose that “first sequence partition” is the most suitable way. This makes the computing time of(21,8) reduce to 20 minutes and the speedup keeps linear growth from 1 core to 128 cores, and the speedup achieves 96.2 on 128 cores.
Keywords/Search Tags:parallel computing, DNA sequence alignment, Motif finding, MPI
PDF Full Text Request
Related items