Researches On Parallel Algorithms For DNA Sequence Alignment

Posted on:2016-03-27

Degree:Master

Type:Thesis

Country:China

Candidate:Q F Xue

Full Text:PDF

GTID:2180330479495443

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Bioinformatics is an interdisciplinary subject which uses information technology and computer science to study a large number of complex biological data. Currently, the DNA sequences in genomics is one of the most important research areas of bioinformatics. The basic method of studying DNA sequences is sequence alignment, it looks for the similarities and homology between sequences through the order of nucleotides, and analyses the genetic and evolution information of some species. In recent years, with the development of biology, the number of DNA sequences increases exponentially, which makes serial sequence alignment algorithms cannot meet the needs of the growing data sizes. This paper focuses on parallel algorithm of sequence alignment, proposes a new sequence encoded approach and “first sequence partition” method to improve the efficiency, which can be used to solve the problem of sequence alignment on large-scale data. The main innovations of this article include:Firstly, we propose a new DNA sequence encoded approach and implement FED parallel algorithm based on MPI. This paper analyzes and compares the DNA sequences exact alignment algorithm, found that along with the increasing of data, the algorithm computing time will increase significantly. To solve this problem, we propose a sequence encoded approach based on bit operation, in order to reduce data storage space and improve the algorithm. Then, we use MPI to implement the FED parallel algorithm. The experiment shows that the speedup of the parallel algorithm can achieve 16.1 on 20 cores.Secondly, we propose the “first sequence partition” method for CVoting parallel algorithm based on MPI. This paper studies Motif finding problem and challenging instances. CVoting is a representative algorithm for solving challenging instances, but it still needs more than 20 hours to solve(21,8). Therefore, we design three data partition methods and propose that “first sequence partition” is the most suitable way. This makes the computing time of(21,8) reduce to 20 minutes and the speedup keeps linear growth from 1 core to 128 cores, and the speedup achieves 96.2 on 128 cores.

Keywords/Search Tags:

parallel computing, DNA sequence alignment, Motif finding, MPI

PDF Full Text Request

Related items

1	Study On Multiple Sequence Alignment And Motif Discovering In Bioinformatics
2	Research On Global Sequence Alignment For Intel Multi-core And Many-core Platforms
3	Study Of Fast Gene Sequence Alignment Method Based On Parallel Computing
4	Parallel DNA Read Mapping Algorithms On Multi-core And Many-core Architecture
5	Parallel Optimization Design And Implementation Of Biological Sequence Alignment Algorithm
6	Parallel Design And Optimization Of A Galaxy Group Finding Algorithm On SGI Vs. Distributed-memory Cluster
7	Study Of Dna Sequence Motif Identification Algorithm
8	Research On Algorithms And Theories In Genomic Sequence Annotation
9	Research And Implementation Of High-Throughput Sequencing Alignment Method Based On Distributed Computing
10	Research On Sequence Alignment Algorithms For Large-scale Sequencing Datasets