Font Size: a A A

Communication Avoiding Generalized Conjugate Residual Method

Posted on:2020-03-02Degree:MasterType:Thesis
Country:ChinaCandidate:L YangFull Text:PDF
GTID:2370330575970551Subject:Science of meteorology
Abstract/Summary:PDF Full Text Request
With the development of modern meteorology,the resolution of numerical weather prediction model is required to be higher and higher.The speed of numerical weather prediction mode is the objective prerequisite of the improvement of resolution,and it is the necessary and not sufficient condition.Massively parallel computing is now the main way to improve the speed of mode.The scale and performance of the supercomputing cluster are increasing.In order to make full use of the computing power of large-scale supercomputing clusters,the scalability of numerical models needs to be improved.The main part of the dynamical framework of the "global/regional assimilation and prediction system(GRAPES)" by the China meteorological administration(CMA)is the solver of a Helmholtz equation.The iterative algorithm adopted by the equation solver is "generalized conjugate residual method(GCR)".The main factor that restricts the scalability of the equation solver is the frequent global communication in GCR algorithm.In this paper,we proposed the “communication avoiding generalized conjugate residual method(CA-GCR)”.The new algorithm reduces the global communication times by one order of magnitude compared with the original algorithm,and reduces part of the local computation at the same time.The disadvantage is that it leads to a small decrease in convergence speed,that is,a small increase in the number of iterations.The new and old algorithms were compared from 32 to 16384 processes in the newly deployed "Sugon Pi" cluster of CMA with 1 ° and 0.5 ° and 0.25 ° and 0.05 ° global workload.Experimental results show that under the condition of high resolution and large-scale parallelism,the new algorithm is superior to the original algorithm in terms of total time,local computing time,communication time and scalability.Under the same parallel scale,the total time can reach up to 3 times the speed of the original algorithm.The number of iterations of the new algorithm is 21% higher than that of the original algorithm.It is found that the advantages of the new algorithm mainly come from the reduction of local computation when the parallel scale is small.In the case of large parallel scale,the advantage of the new algorithm mainly comes from the reduction of global communication.At the same time,in the case of a particularly small scale,due to the increase of the number of iterations and the increase of the memory occupation,the time of the sparse matrix vector multiplication and other parts of the new algorithm increases,so it is slower than the original algorithm in some tests.In the case that the resolution of the example is low and the parallel scale is particularly large,the performance of the new algorithm is unstable due to the small computation amount of each process and the extremely short total running time,which are greatly affected by the fluctuation of the computing platform.Because this work aims to provide the necessary conditions for improving the numerical model resolution,is 0.5 ° resolution of the operation and is currently the global model,so focus on 0.05 ° and 0.25 ° in massively parallel high resolution numerical example,the performance of the two algorithms,according to the test results under these conditions,this paper think that the new algorithm is better than the original algorithm.
Keywords/Search Tags:Communication avoiding, GRAPES, Helmholtz equation, Parallel computing, GCR
PDF Full Text Request
Related items