Source-to-Source Parallelization Research Of Loop For CUDA

Posted on:2015-07-14

Degree:Master

Type:Thesis

Country:China

Candidate:X Y Sun

Full Text:PDF

GTID:2298330422983515

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

In recent years, GPU has been widely used in high performance computing. GPUcomputation except graphics rendering is called GPGPU. It is difficult to develop traditionalGPGPU since graphics API is directly used for programming. Nowadays, CUDA has beenwidely used by reducing the difficulty of writing parallel programs. But developing parallelprograms manually based on CUDA is also a challenge for programmers, becauseprogrammers must deeply master GPU architecture and CUDA model. Reducing thedifficulty of developing parallel programs is very important for the popularization andapplication of GPGPU.We study the problem of generating GPU parallel programs automatically and propose asource-to-source parallel architecture called STS-CUDA in this paper. STS-CUDA cantransform serial programs with loop into CUDA C parallel programs on GPU, which makesthe CUDA parallel programming more convenient. Working processes of STS-CUDA are asfollows, first analyzing serial C program and inserting directives of STS-CUDA which isrelated to parallel transformation in its proper place, and then converting it to correspondingCUDA C parallel program through recognition and matching of STS-CUDA. Methodsinvolved in the process of realizing parallel transformation using STS-CUDA are studied inthis paper, including dividing tasks reasonably, optimizing communication of host and device,optimizing access of global memory and shared memory, and so on. Examples are tested inthe end.The results and speedup of two matrix multiplication parallelization and a BP algorithmparallelization experiments through STS-CUDA are similar to that of handwritten CUDAparallel program. We can further study how to completely shielding GPU bottom architecturethrough reducing STS-CUDA directives and how to optimize target codes through addingmore optimization methods.

Keywords/Search Tags:

GPU, GPGPU, CUDA, Source-to-Source Parallelization

PDF Full Text Request

Related items

1	The Research And Implementation Of Parallelization Method For Identification And Localization Of Sound Source
2	Research And Implementation On Compiler Framework For Translating Ansic C Into CUDA C
3	Research On Parallelization Of Deep Learning Algorithms Based On GPU
4	Research And Implementation Of Transplant CUDA Program Based On Android
5	The Research On Source-Location Privacy Protection Algorithm Based On Phantom Source And Fake Source In WSN
6	CUDA-CHiLL: A programming language interface for GPGPU optimizations and code generation
7	Broadband And Narrowband Radar Signal Processing Based On GPGPU
8	Research And Lmplementation Of Parallelization Of Machine Learning Algorithm Based On CUDA Platform
9	High Performance Implementation Of Multiple Machine Learning Algorithm On GPGPU
10	The Research On Source-Location Privacy Protection Algorithm Based On Node Distance And Fake Source In WSNs