Font Size: a A A

Study Of CUDA Applications In Biological Sequence Processing

Posted on:2012-06-20Degree:DoctorType:Dissertation
Country:ChinaCandidate:W D SunFull Text:PDF
GTID:1220330467981062Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Nowadays new high-throughput DNA sequencing technologies can yield a huge volume of sequence data in a short time with affordable price, and as a result the accumulation of biological sequence data is growing with explosion rate. At the same time, the computer processor frequency is almost reaching its theoretical upper limit, which means all existing serial algorithms can no longer improve their performance simply by using higher clock frequency as past. Parallel processing technology is one of the inevitable courses to decrease such a disparity. Since the GPUs can provide higher memory bandwidth and more computational horsepower than CPU, and the biological sequence can be classified as a large-scale data set with simple structure, the manycore GPU computing technology is adopted in the thesis to implement the low cost and high efficiency sequence processing system, which is characterized by its own data parallel processing instruction set. Several classical sequence analysis problems are elaborately picked as the study object of parallel processing according to their complexities, and then efficient data parallel algorithms for them are designed, which are subsequently implemented, analyzed and optimized on CUDA parallel computing platform. The main contributions can be summarized as follows.First, a comprehensive study of the software and hardware architecture for CUDA parallel computing is given. As GPUs have SIMD instruction set, common data parallel primitives are summarized as the reference and guide for parallel algorithm design and implementation.Second, the efficient data parallel six frame translation algorithm and sequence frequency statistics algorithm are proposed and implemented by using direct and indirect parallelization method for solving open reading frame identification, which is one of the most fundamental problems in sequence analysis domain. Through the parallelization process of ORF, the characteristics of data parallel algorithm design are exemplified, the superiority and inferiority of GPU computing are demonstrated, and the GPU parallel performance acceleration model is also introduced to measure the parallelization efficiency. A highly parallel six frame sequence translation program running on GPU is depicted by using direct transform method. The parallel kernel code runs about5times faster than serial implementations on CPU. The sequence frequency statistics process is not quite easy for parallelization because it contains heavy parallel access conflict and latency. So the indirect parallelization method is employed by using sorting and prefix-summing operations as replacement. The indirect parallel version can perform as almost the same as the serial counterpart in general. The overall efficiency of parallel open reading frame translation algorithm is about2times faster than the serial CPU version.Third, a data parallel algorithm for fast finding of short and exact repeats is proposed for repeats detection problem in genome sequencing and analysis, which is implemented by employing dictionary order construction method and runs one magnitude faster than existing serial algorithms. Repeat regions in DNA play very important roles in many vital biological functions, and the repeats finding is always deemed as another fundamental problem in bioinformatics. The design and implementation issues of the proposed algorithm on CUDA platform are depicted in detail. By using these short repeats as seeds, a heuristic data parallelization solution of Hamming distance and edit distance repeats detection is further investigated with a feasible parallel threads scheduling plan. The step by step parallelization method employed during the parallel algorithm design is carried out according to functional code blocks of the serial algorithm. This method can not only fast identify the bottlenecks of parallel processing for parallelization difficulty validation, but also is convenient for parallel program design, debug, analysis and optimization, which can serve as the guide for the parallelization of most existing serial codes.Fourth, a new parallel prefix-doubling suffix array construction algorithm is proposed by replacing common group strategy with sorting for suffix array construction problem, whose parallel execution efficiency is much better than the serial counterparts. Suffix array is widely used in sequence analysis, string matching, compression, etc. Over the last few years, suffix array algorithms for construction and application have constituted an intense area of research within computer science. After the analysis and comparison of existing serial algorithms, a more compact parallel prefix-doubling suffix array construction algorithm is proposed by employing abstraction and functional alternation methodology, which is suitable for GPU computing. The proposed algorithm can run independently for parallel suffix array construction and also is compatible and cooperative with existing serial prefix-doubling algorithms for peak performance (about3times faster than the totally serial counterpart). The experiment results show that the proposed algorithm is simple, fast and scalable for real-world data processing, which is one of the fastest algorithms running on a commodity PC. It is especially efficient when processing small alphabet biological data, which is faster than all existing serial algorithms.
Keywords/Search Tags:CUDA, GPGPU, ORF, repeats finding, regulator detection, suffix arrayconstruction, prefix-doubing
PDF Full Text Request
Related items