Study Of CUDA Applications In Biological Sequence Processing

Posted on:2012-06-20

Degree:Doctor

Type:Dissertation

Country:China

Candidate:W D Sun

Full Text:PDF

GTID:1220330467981062

Subject:Computer system architecture

Abstract/Summary:

PDF Full Text Request

Nowadays new high-throughput DNA sequencing technologies can yield a huge volume of sequence data in a short time with affordable price, and as a result the accumulation of biological sequence data is growing with explosion rate. At the same time, the computer processor frequency is almost reaching its theoretical upper limit, which means all existing serial algorithms can no longer improve their performance simply by using higher clock frequency as past. Parallel processing technology is one of the inevitable courses to decrease such a disparity. Since the GPUs can provide higher memory bandwidth and more computational horsepower than CPU, and the biological sequence can be classified as a large-scale data set with simple structure, the manycore GPU computing technology is adopted in the thesis to implement the low cost and high efficiency sequence processing system, which is characterized by its own data parallel processing instruction set. Several classical sequence analysis problems are elaborately picked as the study object of parallel processing according to their complexities, and then efficient data parallel algorithms for them are designed, which are subsequently implemented, analyzed and optimized on CUDA parallel computing platform. The main contributions can be summarized as follows.First, a comprehensive study of the software and hardware architecture for CUDA parallel computing is given. As GPUs have SIMD instruction set, common data parallel primitives are summarized as the reference and guide for parallel algorithm design and implementation.Second, the efficient data parallel six frame translation algorithm and sequence frequency statistics algorithm are proposed and implemented by using direct and indirect parallelization method for solving open reading frame identification, which is one of the most fundamental problems in sequence analysis domain. Through the parallelization process of ORF, the characteristics of data parallel algorithm design are exemplified, the superiority and inferiority of GPU computing are demonstrated, and the GPU parallel performance acceleration model is also introduced to measure the parallelization efficiency. A highly parallel six frame sequence translation program running on GPU is depicted by using direct transform method. The parallel kernel code runs about5times faster than serial implementations on CPU. The sequence frequency statistics process is not quite easy for parallelization because it contains heavy parallel access conflict and latency. So the indirect parallelization method is employed by using sorting and prefix-summing operations as replacement. The indirect parallel version can perform as almost the same as the serial counterpart in general. The overall efficiency of parallel open reading frame translation algorithm is about2times faster than the serial CPU version.Third, a data parallel algorithm for fast finding of short and exact repeats is proposed for repeats detection problem in genome sequencing and analysis, which is implemented by employing dictionary order construction method and runs one magnitude faster than existing serial algorithms. Repeat regions in DNA play very important roles in many vital biological functions, and the repeats finding is always deemed as another fundamental problem in bioinformatics. The design and implementation issues of the proposed algorithm on CUDA platform are depicted in detail. By using these short repeats as seeds, a heuristic data parallelization solution of Hamming distance and edit distance repeats detection is further investigated with a feasible parallel threads scheduling plan. The step by step parallelization method employed during the parallel algorithm design is carried out according to functional code blocks of the serial algorithm. This method can not only fast identify the bottlenecks of parallel processing for parallelization difficulty validation, but also is convenient for parallel program design, debug, analysis and optimization, which can serve as the guide for the parallelization of most existing serial codes.Fourth, a new parallel prefix-doubling suffix array construction algorithm is proposed by replacing common group strategy with sorting for suffix array construction problem, whose parallel execution efficiency is much better than the serial counterparts. Suffix array is widely used in sequence analysis, string matching, compression, etc. Over the last few years, suffix array algorithms for construction and application have constituted an intense area of research within computer science. After the analysis and comparison of existing serial algorithms, a more compact parallel prefix-doubling suffix array construction algorithm is proposed by employing abstraction and functional alternation methodology, which is suitable for GPU computing. The proposed algorithm can run independently for parallel suffix array construction and also is compatible and cooperative with existing serial prefix-doubling algorithms for peak performance (about3times faster than the totally serial counterpart). The experiment results show that the proposed algorithm is simple, fast and scalable for real-world data processing, which is one of the fastest algorithms running on a commodity PC. It is especially efficient when processing small alphabet biological data, which is faster than all existing serial algorithms.

Keywords/Search Tags:

CUDA, GPGPU, ORF, repeats finding, regulator detection, suffix arrayconstruction, prefix-doubing

PDF Full Text Request

Related items

1	Research On Change Point Detection Algorithm Based On Variance Of AUC And Its Parallelization
2	White Light Interferometry For Fast Areal Surface Measurement Based On GPGPU
3	Studied On Gene Sequence Alignment Based On Mixed Suffix Tree And Suffix Array
4	LM-Suffix: Research On Gene Sequence Index Structure Based On Suffix Tree
5	Rties Of Overlapping Words And Disjunctive Language Of Them
6	The Research On Optimizing Sparse Matrix Computation Based On GPU
7	Design And Implementation Of Parallel PHYLIP Based On The Multi-Core Computing Environment
8	Extraction Of Geographic Information Elements And Spatial Positioning Method For The Webpage Text
9	Comparison Of DNA Sequences Based On Common Prefix Identifiers And Their Lengths And Positions
10	Novel Multiple-point Geostatistical Algorithms Based On Graphics Processing And Model Evaluation