Font Size: a A A

Parallel Alignment Of Third-generation Sequencing For Large Scale Heterogeneous Systems

Posted on:2022-10-25Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y XiaFull Text:PDF
GTID:2530307169983289Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Single Molecule Real-Time(SMRT)sequencing is one of the popular issues in third-generation sequencing technology.Compared with next-generation sequencing technol-ogy,SMRT can detect single molecules and has much longer read lengths.It can reach an average base length of 10kbp,about two orders of magnitude longer than the second generation sequencing technology.Longer read length brings great convenience to the detection of structural variation and genome assembly,but it also brings new chanlleges to the sequence analysis.Sequence alignment is the central and fundamental problem in many sequence analysis procedures,while local alignment is often the kernel of these al-gorithms.It is mainly used to find the best subsequence match between given sequences.This process is usually based on the Smith Waterman algorithm,and the time complexity is O(n~2).As the performance of a single CPU has reached its bottleneck,the serial sequence alignment program is far from meeting the requirements of SMRT sequence alignment.To solve this problem,this paper uses MPI,Open CL parallel computing and heteroge-neous computing technology to realize the third-generation sequence parallel alignment algorithm for large-scale heterogeneous systems,which effectively improves the perfor-mance of the third-generation sequence alignment algorithm The contributions of this thesis can be summarized as follows:(1)Analysis and comparison of parallel sequence alignment algorithms.This paper first summarizes the existing parallel sequence alignment algorithms,These algorithms are summarized and compared from the perspectives of vector-level parallelism,thread-level parallelism,process-level parallelism and heterogeneous parallelism.This paper focuses on the analysis of the impact of comparison on the performance of data layout comparison in matrices,and analyzes the existing problems and development trends of parallel sequence alignment algorithms at present.It is considered that large-scale het-erogeneous parallel acceleration is the main way to improve the performance of align-ment algorithms,and the research methods in the next two chapters of this paper are also determined.(2)Implementation and optimization of large-scale parallel alignment algorithm for third-generation sequence alignment.The existing third-generation sequence alignment algorithms can only run on a single node,which cannot effectively cope with the rapid growth of massive sequencing data.To solve this problem,this paper develops a large-scale parallel sequence alignment algorithm Pr HAT based on MPI,designs and imple-ments a reference sequence preprocessing strategy suitable for multi-node data distribu-tion and a read segment sequence preprocessing strategy for multi-node sequence alloca-tion,and optimizes multi-level parallel sequence alignment by group partition.Experi-mental results show that Pr HAT algorithm has good performance.When using 16 nodes to test human genome data,the running time of the algorithm is reduced from 2972 sec-onds to 213 seconds,the speedup ratio reaches 14.87 x,the efficiency remains above 93%,and the weak scalability remains above 80%.(3)Implementation and optimization of general heterogeneous parallel algorithm for third-generation sequence alignment.In order to further improve the performance of the third-generation sequence alignment algorithm.Based on Pr HAT,this paper adopts Open CL to further implement the general third-generation sequence alignment algorithm for heterogeneous platforms,and developed Open Pr HAT.Considering the communica-tion overhead between the host and devices during the computation process,this paper selects the corresponding device according to the sequence length and optimizes the pro-gram.We test the performance of Open Pr HAT on the same data set as the original algo-rithm.Compared with Pr HAT,it achieves a performance improvement of 5%~10%on a single node and about 3%on multiple nodes.The other advantage of Open Pr HAT is that it can run on multiple devices,which means that it has good portability.
Keywords/Search Tags:SMRT, sequence alignment, Smith-Waterman Algorithm, parallel computing
PDF Full Text Request
Related items