Font Size: a A A

Research On Hybrid Sequence Alignment Algorithm For Biology Database Based On Sunway Platform

Posted on:2021-02-28Degree:MasterType:Thesis
Country:ChinaCandidate:H ZhangFull Text:PDF
GTID:2480306032465244Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In bioinformatics research,biological sequence alignment is its basic composition and important basis.The basic idea of sequence alignment is based on the general rule that sequence determines structure and structure determines function in biology.The nucleic acid sequence and the sequence based on protein primary structure are regarded as strings composed of basic characters.The purpose is to find out the similarity between sequences,discover the internal connection between sequences,and further explore the information of function,structure and evolution in biological sequences.Nowadays,with the explosive growth of biological database size,the complexity of sequence alignment process also increases rapidly,resulting in a huge increase in computing time.To speed up the process of searching biological databases,this is usually done on large-scale supercomputers.The Sunway TaihuLight is the world's first heterogeneous supercomputer with a peak performance of more than 100 PFlops.It is completely built with the domestic heterogeneous many-core SW26010 processor,providing a brand-new hardware platform for biology database search.So far,many high-performance applications have been transplanted and optimized on Sunway platform and have achieved good acceleration effect.Some of them have won the Gordon Bell Prize for outstanding performance applications.However,due to the special heterogeneous many-core architecture of Sunway platform,the existing sequence alignment algorithm cannot be accelerated directly on Sunway platform.In this paper,a hybrid sequence alignment search algorithm for biology database based on domestic many-core platform is proposed.Aiming at the sequence alignment algorithm,which is the core of sequence alignment search program for biology database,a hybrid sequence alignment algorithm is proposed for biology database search by combining Smith-Waterman(SW)local alignment algorithm and Needleman-Wunsch(NW)global alignment algorithm.And this paper implements the hybrid sequence alignment algorithm through the message passing interface(MPI)and accelerated threading library(Athread).In order to give full play to the performance of the SW26010 processor,according to the hardware characteristics of Sunway many-core architecture and the software characteristics of the hybrid alignment algorithm,the transplanted algorithm is optimized from three aspects:compilation optimization,many-core memory optimization and load balancing,which effectively improves the running efficiency of the algorithm.In this paper,Swiss-Prot,a protein sequence database maintained by EBI(European Bioinformatics Institute),is used to test the algorithm in single node and multi nodes respectively.The experimental results show that the algorithm in this paper can effectively use the special hardware architecture of SW26010 processor,and achieve a maximum acceleration ratio of 15.92 compared with the MPE version of Sunway on a single node,and achieve a 4.33 times acceleration compared with the Intel Xeon E5620 platform.In addition,multi-node tests of the parallel algorithm are performed on Sunway TaiLight platform.When 64 nodes are used,the speedup of the proposed algorithm exceeded 1000 times.The results show that the parallel implementation of the hybrid sequence alignment algorithm has good scalability.
Keywords/Search Tags:Sunway TaihuLight, Sunway many-core processor, Biology database search, Sequence alignment, Optimization
PDF Full Text Request
Related items