Font Size: a A A

The Automatic Generation System Of The Parallel PCG Method Based On CUDA

Posted on:2018-01-29Degree:MasterType:Thesis
Country:ChinaCandidate:Y WangFull Text:PDF
GTID:2310330518476613Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The preconditioned conjugate gradient(PCG)algorithm is one of the popular methods for solving large sparse linear systems.In recent years,accelerating the PCG algorithm on GPU has attracted considerable attention.However,on a specific multi-GPU platform,producing a highly parallel PCG implementation for any large-sized problem requires significant time because several manual steps are involved in adjusting the related parameters and selecting an appropriate storage format for the matrix block that is assigned to each GPU.Therefore,using the optimizing model technology,we construct the performance model for each one of main components of the PCG algorithm,and thus rapidly generate the parallel PCG algorithm by automatically selecting the optimal kernel and corresponding parameters from existing kernels.The main work and contributions are summarized as follows:1.Construct the parallel optimization performance models for the vector operation and inner product.Utilizing the vector-operation and inner-product optimization models,decision trees are automatically generated.2.Construct parallel optimization performance model for SpMV.We take five classical storage formats and corresponding kernels to construct the performance models.Experimental results show that the accuracy of the execution time that is estimated by our proposed SpMV optimization performance model is more than 95%.3.Design a parallel optimization framework of PCG.In our proposed PCG optimization framework,each model is independent and easily extensible.4.Implement an automatic generation system of the PCG method.This system can use the graphical visualization interface to build the parallel optimization performance model for each one of main components of the PCG algorithm,and thus automatically generate the PCG algorithm with high performance.Experimental results show that the average speedup ratios of the parallel PCG algorithm are 56.91 and 104.06 on one GPU and two GPUs,respectively.
Keywords/Search Tags:parallel PCG method, performance model, automatic generation system, CUDA, GPU
PDF Full Text Request
Related items