Font Size: a A A

The Performance Prediction And Efficient Implementation Of The Conjugate Gradient Method On The CPU+GPU Architecture

Posted on:2018-03-02Degree:MasterType:Thesis
Country:ChinaCandidate:R X WangFull Text:PDF
GTID:2350330542985202Subject:Computational Mathematics
Abstract/Summary:PDF Full Text Request
In recent years,Tianhe-2 had been ranked first for several consecutive years in the Top 500 of supercomputers,Sunway TaihuLight also won the first in last year! It can be said that China has come to the forefront of the world in the hardware of supercomputers.In order to make full use of such a huge computing power,it is necessary to combine the characteristics of the supercomputer architectures for efficient parallel algorithm design and evaluation closely.At present,a prominent feature of the supercomputers is heterogeneous and many cores.With the processing speed of CPU almost reached the physical limit,supercomputers need to cooperate with GPU(or MIC)for synergistic acceleration,namely,CPU+GPU has become the basic architecture of supercomputer.On the other hand,many applications of science and engineering computation are attributed to the solution of large sparse linear algebraic systems,whose calculation time is often a very large proportion of the whole numerical simulation(some even more than 80%,such as radiation hydrodynamics or radiation transport numerical simulation).Therefore,it is of great importance to study the evaluation and the efficient parallel solution of linear algebraic equations based on CPU+GPU architecture in the numerical simulations for supercomputers.In this paper,it takes conjugate gradient(CG)method as an example for exploration and the main work of this paper as follows:Firstly,we establish two new performance prediction models of sparse matrix-vector multiplication on GPU:uniform distribution model and normal distribution model,which is according to the four kinds of sparse matrix storage formats:CSR-V,CSR-S,ELL and JAD,respectively.From the perspective of matrix structure and the statistical point of view,the performance prediction functions with two parameters of these two models are constructed by linear fitting.The two parameters of a target matrix are extracted according to the structure and scale of the matrix and then put them into the performance prediction functions,which can predict the runtime of the target matrix.It not only overcomes the shortcoming of the literature,who is not consider the defects of the matrix structure for the performance prediction model of SpMV on GPU,but the experimental results showed that the prediction accuracy of our two models are higher than the model in the literature 1.69 times on average.Moreover,the average error of the estimated time and the measured time is less than 3.5%.Secondly,we establish the performance prediction model of CG method on CPU+GPU architecture.After analyzed the communication principle between CPU and GPU and the kernel computation of CG,we construct the performance prediction functions for the communication and the computation,which consider the overlapping.Moreover,we also establish the performance prediction model for the vector inner product and the vector update.At this point,we can get the performance prediction model of the whole CG algorithm on CPU+GPU architecture,which combine the above prediction model of SpMV on GPU.The numerical experiments were carried out on PC and the supercomputer "Yuan" of Chinese Academy of Sciences Network Information Center,respectively.The results show that the average error for the estimated time and the measured time of the CG method is less than 4.3%.The performance prediction model constructed in this paper can be easily extended to other Krylov subspace iteration methods(such as GMRES,BiCGSTAB,etc).Thirdly,CG method is implemented more efficiently on the CPU+GPU architecture through the overlapping of computations.We found that it is no correlation between the vector inner product on CPU and the matrix-vector multiplication on GPU in each iteration after analyzing the CG algorithm.So it can be computed at the same time,i.e.overlap computing,which reduce the computational time and improve the performance.Furthermore,we also applie the overlap techniques on MCG algorithm,which is rearrangement from CG.The matrix-vector multiplication of MCG can overlap it's one vector inner product and two vector updates in each iteration,so the overlap degree of MCG is higher.The experimental results show that the performance improvement is more obvious of MCG after overlapping.
Keywords/Search Tags:Conjugate gradient method, Sparse matrix-vector multiplication, Performance prediction model, Uniform distribution, Normal distribution, CPU+GPU architecture, Overlapping
PDF Full Text Request
Related items