Font Size: a A A

Research On Optimization Of Parallel Algorithm For Global Spectral Model Based On Overlapping Computation With Communication

Posted on:2023-04-13Degree:MasterType:Thesis
Country:ChinaCandidate:D Z LiuFull Text:PDF
GTID:2530307169979769Subject:Journal of Atmospheric Sciences
Abstract/Summary:PDF Full Text Request
In this paper,the numerical prediction global spectrum model is taken as the research object.In view of the problem of large communication overhead caused by the increase in the number of cores used in the Yin-he Global Spectral Model(YHGSM)and the Beijing Climate Center Atmospheric General Circulation Model(BCC_AGCM),The parallel optimization scheme overlapping communication with computation of level grouping,grid circle grouping and pipeline,to reduce the communication overhead of the model.Design experiments before and after optimization of the two models under multiple core configurations.The comparison test results show that the optimization scheme based on overlapping computation with communication can effectively hide the communication overhead of the global spectral model for numerical prediction and reduce the running wall clock time of the model,and can ensure the correctness of the optimization scheme.In the semi-Lagrangian interpolation scheme of the YHGSM in grid space,communication needs to be performed first and then the interpolation computation is performed.Due to the large number of levels in the vertical direction,the communication overhead caused by this is relatively large.In order to solve the above problems,this paper proposes an optimization scheme overlapping computation with communication for vertical levels grouping.The vertical levels are divided into three groups,and the non-blocking communication method is adopted to overlap the computation of the latter group with the communication of the former group.The communication of the two groups is thereby hidden.Experiments show that the scheme overlapping computation with communication based on vertical level grouping can reduce the running time of the YHGSM semi-Lagrangian interpolation scheme,the running time of the optimization scheme is reduced by 0.05 s,and the running speed is increased by 12.50%.In the 100-step running time of the model,since the semi-Lagrangian interpolation stage only accounts for about 10% of the entire model,in the 8*16 configuration in latitude and longitude,the model running time is reduced by a maximum of 6.32 s.When the total number of cores is a fixed value,the larger the number of cores in the latitude direction,the shorter the running time and the better the performance.As the number of cores increases,the speedup ratio tends to increase,while the parallel efficiency decreases gradually.According to the temperature simulation of the 60 th levels and the east-west wind speed of and 850 h Pa before and after the YHGSM optimization scheme,the simulation results before and after the optimization are consistent,which verifies the correctness of the optimization scheme.In the two-dimensional parallel version PAGCM of BCC_AGCM,the dimensions of the two-dimensional area division of the grid space,the Fourier space and the spectral space are inconsistent,so the three-dimensional data needs to be redistributed in each space.The Fourier transform stage accounts for a large proportion,and the grid point computation and the Fourier forward and inverse transformation are carried out by latitude circles,so they can be grouped by latitude circles.The communication overhead is hidden by overlapping the communication of the redistribution of each set of data with the computation of the Fourier transform.By adjusting the number of groups with variable parameters,the pipeline operation of computation and communication can be realized,and the overlapping efficiency of communication overhead can be further improved.The test results show that,through the optimization of computation and communication overlap,in the 10-day running time of BCC_AGCM,the total dynamic frame process running time has been reduced.In the 16*4 configuration,the optimal grouping is Ng=10,and the running time is reduced by 6.19%;When the Fourier transform stage is 4*16 in latitude and longitude,and the optimal grouping is Ng=8,the running time is reduced by 18.84%;when the inverse Fourier transform stage is 16*4 in latitude and longitude,and the optimal grouping is Ng=10,the running time is reduced.25.41%.In the overlapping scheme of the pipeline,the optimal number of groups can be found.In scalability experiments,it is found that the larger the number of processes in the latitude direction,the shorter the running time of the optimization scheme,and it is found that the 32*4 configuration in latitude and longitude is the best,and the scalability effect is not obvious when the total number of cores exceeds 128.The overview shows that the three-dimensional data redistribution scheme based on overlapping computation with communication can effectively hide the communication overhead,and by changing the number of groups to find the best grouping scheme,maximize the model operation efficiency,and reduce the model running time.The simulation of temperature and east-west wind speed of the 13 th level before and after the BCC_AGCM optimization scheme can be seen,and the simulation results before and after the optimization are consistent,which verifies the correctness of the optimization scheme.
Keywords/Search Tags:numerical prediction technology, Global Spectral Model, YHGSM, BCC_AGCM, parallel computing, overlap computation and communication
PDF Full Text Request
Related items