Font Size: a A A

Research And Development Of Performance Improvement Method In Indel Realignment Model Of NGS Procedure

Posted on:2017-08-26Degree:MasterType:Thesis
Country:ChinaCandidate:L ZhangFull Text:PDF
GTID:2310330491964255Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The research of information which related to human genes is promoted by the development of high-throughput sequencing technology, and the insertion/deletion (Indel) analysis and detection is an important part of the study. Due to the Indel detection which based on metagenomics, genome-wide association study, detection and prevention of genetic diseases in targeted sequencing technology has a higher requirement in diversity and efficiency, programming an Indel analysis program which focus on site-specific analysis and higher efficiency has become an increasingly urgent requirement in practical environment. An Indel Realignment module is achieved in this paper which based on sequence alignment and parallel computing, the main contents are as follows:(1) In order to meet the diversity needs in Indel detection, SW sequence alignment algorithm is analyzed in the condition of adjusting the alignment strategy both in sequence successful alignment rate and in different lengths Indel screening, results were statistically counted and compared in this paper. The results showed that the success rate of the sequence alignment showed a normal distribution in experiment by setting different pass rate in parameter of the module. In sequence alignment process, the final data shows difference judgments in size and position by setting Indel Scoring strategy.(2) In order to meet the efficiency needs in Indel detection, two optimization strategies in both fine-grained and coarse-grained were discussed according to the characteristics of drawing scoring matrix in SW sequence alignment algorithm. In fine-grained optimization strategy, using the TWF strategy improves the scoring matrix building efficiency and relieves IO performance bottlenecks by tile unit because of high data communication ratio. In coarse-grained optimization strategy, using a buffer queue in memory eases IO pressure which due to low IO processing efficiency because of frequent calls for IO operational, this method improved the efficiency of the module.(3) In order to further improve the efficiency of the module, the exploration of higher degree parallelism sequence alignment mode which running the Indel Realignment module in the cluster environment has been made based on the principle of dividing the same amount of sequence data, and this method achieved a good acceleration effect. Through the study of the CUDA architecture in threads organization, host communication with peripherals and data storage mode, a feasibility judgment is proposed in heterogeneous computing environment of Indel Realignment module and Indel Realignment heterogeneous processes also described in this paper.Finally, the Indel Realignment module which proposed in this paper has been tested and verified according to the practical application scenario. The results show that:The data which the Indel Realignment module realigned completed the partner's demands in both various judgment and execution efficiency.
Keywords/Search Tags:sequence alignment, biological information, insertion & deletion, parallel computing, CUDA
PDF Full Text Request
Related items