The Research And Implementation Of Pairwise Comparison Task Parallel Of Gene Sequences On Spark

Posted on:2019-01-17

Degree:Master

Type:Thesis

Country:China

Candidate:Y Gao

Full Text:PDF

GTID:2370330566991178

Subject:Agriculture

Abstract/Summary:

PDF Full Text Request

With the development of high-throughput sequencing technology,sequence data is growing exponentially,and it is a hot topic to analyze and dig for valuable information from these sequence data in current research.In bioinformatics,sequences with higher similarity are obtained by pairwise sequence alignment.And then these similar sequences are further compared to predict the homology between multiple sequences.However,it is a complex and time-consuming problem to compare massive sequences entirely.In order to improve the efficiency and scalability of the pairwise comparison,this thesis researches the parallelization problem of pairwise sequence alignment based on big data technology.The main work is as follows:(1)The Blast algorithm of pairwise alignment is implemented on a single machine.The execution steps of the original software are simplified,and the result is consistent with the original software.(2)Using on the principle of equal division,the parallelization of pairwise alignment tasks based on Linux cluster is realized,which improves the comparison efficiency compared to the single machine operation.(3)Based on the configuration files of comparison tasks,the Blast algorithm is invoked by the pipe mechanism of Spark framework,realizing the processing of pairwise alignment tasks based on Spark.In this thesis,a Spark cluster environment with 16 nodes on the vSphere virtualization platform is built.A large number of comparative experiments are carried out on single machine,Linux cluster and Spark cluster.The experimental data shows that the total run time of pairwise alignment in the Spark cluster is less than in single-machine and Linux cluster environments.Moreover,with the increase of the number of cluster computing nodes,it is more efficient and scalable.

Keywords/Search Tags:

Gene sequence, Spark, Pairwise comparison, Blast, Task parallel

PDF Full Text Request

Related items

1	The Parallelization Research Of Genomics Data Comparison Algorithm And The Construction Of Comparison Platform Based On Spark
2	The Research And Implementation Of The Distributed Parallel Blast Algorithm That Is A Gene Sequence Alignment Algorithm Based On Hadoop Platform
3	DNA Sequence Splicing Algorithm Based On Spark
4	Some Limit Properties For Pairwise Nqd Sequence
5	Research And Implementation Of Seismic Big Data Parallel Processing System Based On Spark
6	Parallel Computing Of Spark-based Geospatial Analysis Algorithms
7	Spark Task Scheduling With Immovable Data
8	Cloning And Homologic Analysi Of Part Sequence Of Rice Blast Resistance Gene
9	Study Of Fast Gene Sequence Alignment Method Based On Parallel Computing
10	Study On All-to-all Comparison Problems And Parallelization Of Gene Sequence Alignment Algorithms