Font Size: a A A

The Parallelization Research Of Genomics Data Comparison Algorithm And The Construction Of Comparison Platform Based On Spark

Posted on:2020-01-14Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y LiuFull Text:PDF
GTID:2370330578956455Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In recent years,the emergence of high-throughput sequencing technology has greatly promoted the development of bioinformatics.At present,genome sequence alignment has become an important part of bioinformatics data analysis.BLAST(Basic Local Alignment Search Tool)is a widely used and high-precision local alignment algorithm and can relatively reduce task running time under maintaining high accuracy.However,there are some performance bottlenecks when comparing large dataset of high throughput sequencing data,and the efficiency of comparing is low.The goal is to solve the performance bottleneck of BLAST algorithm,this thesis presents a Spark-BLAST distributed parallel method based on mainstream large data technology.This method is based on the advantage of Spark memory computing,it can identify,divide and calculate gene sequences.Apache YARN resource scheduler is used to complete task scheduling and resource allocation,and finally distributed parallel computing of BLAST algorithm is realized.The experiment is validated by comparing the results of 5-node Spark cluster with those of single-machine BLAST,without changing the accuracy of the results,the acceleration ratio of Spark BLAST can reach about 4.The experimental results show that the parallel method based on Spark can greatly improve the efficiency of BLAST operation,solving its performance bottleneck problem,and provide an efficiency calculation of Spark_BLAST alignment method for the field of bioinformatics.Genome data storage management system of this project is Hadoop’s HDFS,solving the problem of expandable and incremental storage of massive high-throughput genome data.Through the development of Web-end,this project constructed to a convenient graphical interface operation gene alignment platform,for the researchers of the field of bioinformatics with great convenience.
Keywords/Search Tags:Spark, Parallelization, Sequence Alignment, Big Data, Bioinformatics, BLAST(Basic Local Alignment Search Tool)
PDF Full Text Request
Related items