Font Size: a A A

The Research On Parallel Sequence Alignment Tool For Third-generation Sequencing

Posted on:2023-11-18Degree:MasterType:Thesis
Country:ChinaCandidate:Z H WangFull Text:PDF
GTID:2530307097494844Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The development of third-generation sequencing technology has brought significant changes and far-reaching impact to genomics.Compared with the second-generation sequencing technology,the third-generation sequencing technology has a longer sequence length,lower sequencing cost,and a larger amount of data,making the third-generation sequencing produce a large amount of long read.A core challenge in analyzing these sequence data is sequence alignment,which is very time-consuming.However,most of the existing long-read alignment tools are serial programs,and their performance and space-time efficiency are limited.Therefore,after obtaining a large number of long reads,how to realize the fast and accurate alignment of large-scale sequences has become a significant challenge for the long-read alignment.For the problem of large-scale alignment,the traditional data analysis platform and serial method can not effectively deal with it.There is an urgent need for a high-performance computing platform and efficient parallel algorithm based on these platforms.Based on the perspective of parallel computing,this paper studies the parallel sequence alignment tool for the third-generation sequencing technology.The research work is as follows:(1)A parallel optimization algorithm is designed for the third-generation sequencing sequence alignment tool minimap2.Minimap2 is a serial third-generation sequencing sequence alignment tool with excellent program efficiency and accuracy.Based on the Master/Slave model,minimap2 is optimized in parallel from four steps:sharing the reference genome index,evenly dividing the query read,multi-level sequence alignment,and non-blocking output of alignment results.(2)The parallel optimization program minimapM is based on MPI library.Combined with the characteristics of the MPI library,some algorithm details are improved,including broadcasting only the non-zero data in the reference genome to reduce the communication pressure of nodes,adjusting the base level of the division position of the query reads to ensure the integrity of the sequence.Finally,a parallel alignment tool minimapM with better performance is obtained.(3)A parallel optimization program minimapR based on the new parallel general framework Ray.Ray has high fault tolerance,good platform compatibility,and simple deployment of running environment and is a relatively novel framework.Therefore,combined with the characteristics of the Ray and according to the parallel optimization algorithm,a more general and efficient parallel comparison tool minimapR is realized.(4)This paper compares and analyzes the three parallel versions of minimap2,minimapM,minimapR,and IMOS based on MPI,Ray and Spark combined with the experimental evaluation and the characteristics of different parallel frameworks.Experiments show that minimapM has the best acceleration performance,and the parallel efficiency can reach 78.3% under 128 nodes.The acceleration performance of minimapR is close to that of minimapM,and the parallel efficiency can reach 72.5%under 128 nodes.The acceleration performance of IMOS is relatively low,and the parallel efficiency is 40.9% under 128 nodes,but the Spark has rich APIs,mature ecology,and functions.Finally,the advantages and disadvantages of the three frameworks are compared to provide a better choice for practical application scenarios.
Keywords/Search Tags:Sequence alignment, Parallel optimization, Third-generation sequencing, MPI, Ray
PDF Full Text Request
Related items