Font Size: a A A

Study Of Fast Gene Sequence Alignment Method Based On Parallel Computing

Posted on:2016-02-04Degree:MasterType:Thesis
Country:ChinaCandidate:R YangFull Text:PDF
GTID:2180330461457363Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
Gene sequence alignment is a important means of bioinformatics analysis. With the successful completion of the Human Genome Project, the sequencing technology is developing rapidly, the cost of sequencing is significantly reduced, making it possible to accomplish personal genome sequencing, gene sequence alignment computing needs also will be converted to whole genome comparison.In order to adapt to the development of high-throughput sequencing, fast and accurate sequence alignment methods are required.Currently gene sequences service platform, including the US National Center for Biotechnology Information (NCBI) BLAST service platform cannot meet the new requirements of a large amount of computation and high accuracy. Parallel computing as an useful means of an efficient scheduling of computing resources, has been increasingly used in large-scale analysis of gene sequence alignment. The parallel version of NCBI BLAST named mpiBLAST, with the realization of parallel computing, for reference genome, accelerate the computation greatly, but it will ask for too much computing resources and there is still room for improvement when dealing with measured gene.Therefore, in order to more effectively meet the new requirements of the gene sequence alignment the paper designs a method for rapid and efficient gene sequence alignment by parallel computing in tested genome. Details are as follows:1) Based on the idea of BLAST algorithm, firstly the paper analyzed the requirements of parallel processing for sequence alignment algorithms, opted for a short read sequence to achieve fast and accurate alignment algorithm and with improvements made it more suitable for parallel computing;2) Design static/dynamic parallelization allocation strategy for tested genome;3) Analyze the effect of accelerating by parallel computing and optimizing data transmission, through analysis of the effect of static allocation and experiment of simulation data transmission for many times, compare and evaluate the advantages and disadvantages of the static and dynamic allocation strategy and establish a method for selecting the optimal calculate strategy with the length of tested sequence and the maximum of read length;4) Develop an analysis tools online for alignment of the gene sequence with parallel computing.Using multiple whole-genome sequence data of E. coli, yeast, fruit flies and other the paper made comparison and assessment for accelerating efficiency, data transmission optimization and parallel task allocation results of the proposed design solutions. As a result, the proposed method can make scheduling of computing resources be more efficient, reducing the pressure measured in the data transmission of genome, reducing the cache demand of the computer which takes on the task of computing and meet the new requirements of sequence alignment more effectively. It can also provide a viable means of promotion for parallelization gene sequence analytical applications.
Keywords/Search Tags:gene sequence aligmnent, parallel computing, allocation strategy, personal genome, online analysis
PDF Full Text Request
Related items