Font Size: a A A

Using GPU Parallel Computing Technology To Speed Up The Codon Usage Bias Algorithm And Its Related Application

Posted on:2017-05-29Degree:MasterType:Thesis
Country:ChinaCandidate:C Y JingFull Text:PDF
GTID:2180330503983641Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years, the third-generation sequencing technology has become increasingly popular and the study of biology entered into the “big data” era. Many biological databases such as NCBI, EBI and DDBJ were built and enriched. The exponential growth of large amount of biological data brings serious challenges for the ability of analyzing and managing them. One of the most notable things about the biological data is the bio-sequence data. Bio-sequence includes not only DNA sequence and RNA sequence, which contain the genetic information, but also protein sequence that performs a vast array of functions within living organisms and codon sequence, which is generated in the process of genetic expression. The development of the technology nowadays make it possible that the high performance computation device which possesses multi-core and multi-thread can handle the huge amount of biological data and massive compute-intensive job effectively. The CUDA, which is an acronym for Compute Unified Device Architecture, and CUDA-enabled GPU perform well in a large number of parallel computing tasks. It is widely used in many computing intensive field such as scientific computing, computational biology, physics simulation, forecasting, and astronomy. The CUDA-enabled GPU have a large number of computing units, so it can process a large number of calculations in parallel, thus saving considerable time of computing.This study uses CUDA-enabled GPU to speed up the previous serial bioinformatics algorithms by implementing parallel algorithm as well as sets up a parallel bioinformatics platform for Southwest University. In general, there are two major innovations included in our research:(1)Presented and implemented a parallel algorithm for Codon Deviation Coefficient(CDC for short) algorithm based on CUDA-enabled GPU. Codon usage bias(CUB for short) is a widespread characteristics in biology. There are many measures to quantify the CUB. Different measures have different shortage because they take different things into consideration. The shortages are like overreliance on the reference set, the lack of assessing the statistical significance, or uncomprehensive background nucleotide composition and so on. The CDC method can make up for these shortcomings. However, CDC will take a lot of time in the analysis of a large number of codon sequence data. Therefore, we parallelized the CDC to reduce its running time. We divided the CDC into sub-modules and analyzed the data dependence and execution time for each sub-module. Then we did parallel algorithm for each appropriate module. At the same time, we adjusted the data structures to make sure it can be faster and easier to be accessed by the CUDA-enabled GPU. Also we employed the CURAND API to solve the problem of parallel random number. We employ the CUDA-C language to implement parallel CDC method. Through the comparison of executing time between the two methods, we find that the parallel CDC method obtains speedup about 38~398 times as the original method. Within certain limits, the speedup will increase by the increasing of data to analysis. We also verify the validity of the obtained speedup by theoretical analysis of speedup.(2)The construction of bioinformatics platform integrates the parallelization tools. Because of a lot of bioinformatics software and the development of internet at present, all kinds of biological database and data is easier to access. A comprehensive bioinformatics platform can help biologists deal with information processing more conveniently and efficiently. Current bioinformatics platforms are equipped with visual web interface to use bioinformatics software, instead of the inconvenient command line. But most platforms only provided serial software and did not supply Chinese interface. That is inconvenient for us and using the tools in the platforms take lots of time. In this thesis, we employed python and XML language to modify the open source project galaxy, and established an easy-to-use and localized bioinformatics platform. We make our platform be able to call the previous parallel CDC program and other parallelizing tools by modify the source code or writing the configuration file. Users can easily invoke the CUDA-enabled GPU to perform parallel computing through web interface. Users also can save a lot of time by using our parallelization tool inside the platform and construct their own bioinformatics process workflow by utilizing the tools such as data acquisition tools, data conversion tools, data processing tools and result analysis tools.
Keywords/Search Tags:CUDA, GPU, parallel acceleration, codon usage bias, bioinformatics platform
PDF Full Text Request
Related items