Font Size: a A A

Research On Parallel Optimization Method Of Biological Information Sequence Alignment In Multi - Core Environment

Posted on:2016-08-12Degree:MasterType:Thesis
Country:ChinaCandidate:X Y ZhangFull Text:PDF
GTID:2270330461987181Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of big data, it has been a hot topic that how to improve the computational efficiency. Also with the increasing of biological database, it causes the change for the original serial computing model. While increasing the frequency and multi-core structure has become mainstream parallel computing instead of it. Unlike specific hardware requirements of GPU, multi-core structure in data transmission or portability both have an advantage on the upper and prospects, this paper will use Open MP on multi-core platforms which is widely used to parallel BLASTN, and to reduce the time expenses the Trie which based preprocessing mechanism and scheduling assignment algorithms is used on parallel computing.Firstly, this paper proposes preprocessing algorithm based on the Trie structure, the main idea is to use the same word in Trie filter group, simplify database processing, and reduce the number of matches in BLASTN algorithm. Pretreatment mechanisms in this paper include the original database which is divided into a plurality of small database, and then let the target sequence into a length of W’s word hash table to establish Trie storing the same block. Experimental results show that the establishment of the Trie pretreatment mechanisms performs better where the database is larger, however, it plays a a certain role in parallel algorithm for optimizing BLASTN.Secondly, this paper researches on the BLASTN algorithm and analysis of the parallel feasibility by using perf to analysis hot function of BLASTN to parallel reconstruction of BLASTN. BLASTN in parallel to the main in seed stage and extend the matching stage, the former query sequence of word formation can be divided into stages and the query sequence is group of words with the target word sequence comparison group get high word group(HSP) stage and parallelization. At the same time, using multiple core calculate task amount; the latter to extending matching stage about at the same time extension of matching and merging the position of HSP search tree on the contiguous word group, reduce repeat matching times, and enable the parallelization of BLASTN accelerated. Experiments show that the best case and parallel the BLASTN algorithm of time compared with the original reduced nearly half, namely the acceleration ratio is 2, but with the increase of sequence databases, accelerate ratio curve will be continued to rise.Finally, considering the processors multi core computing tasks scheduling, this paper proposes algorithm ZD based on stack cycle scheduling allocation, a measure of task size benchmark with the length of the sequences in the database. Experiments show that the scheduling algorithm in general equilibrium quantity allocation and scheduling of calculation, even in the worst case, the ZD algorithm does not affect badly compared to the no efficiency scheduling algorithm.
Keywords/Search Tags:Multi-core, OpenMP, Biological information, Sequence alignment, BLASTN
PDF Full Text Request
Related items