| The high-performance computing industry has evolved from the initial approach to improving the central processing unit's(CPU)main frequency to parallel computing based on many-core architectures.Among them,GPU is an outstanding representative of many-core architecture.Because of its excellent parallel computing performance and low power consumption,more and more applications are accelerated by using GPU.Radiated transmission is a physical process that is relatively time consuming in the atmospheric circulation mode.Since radiated transmission is computationally intensive,a large amount of computational resources are consumed in the radiation transmission simulation.At present,the mainstream radiation transmission mode RRTMG greatly reduces the amount of computation under the premise of ensuring accuracy,but it still occupies 25-35% of the computation time of the physical process.At the same time,the RRTMG radiated transmission algorithm has weak data dependence.Therefore,in order to improve the computational efficiency of the RTMMG,it is necessary to adopt GPU technology to accelerate the RTMMG.The main work of this paper includes the following aspects:(1)The RTMMG_LW 1D,2D and 3D GPU acceleration algorithms are proposed;and the GPU version of RRTMG_LW is implemented based on CUDA Fortran.In the RTMMG_LW 3D GPU acceleration algorithm,the acceleration method of “first parallel,then accumulate” is proposed for the subroutine rtrnmc,which improves the parallel algorithm of rtrnmc and improves the parallel computing efficiency of rtrnmc.Experimental results show that on a single GPU,RRTMG_LW achieves 30.98× faster than single CPU core calculations.(2)Applying the GPU-accelerated version of the RTMMG_LW result to the Earth system mode CAS-ESM,the CAS-ESM is quickly calculated.Due to the limited computational performance of single GPU,in order to further improve the computational performance of RTMMG_LW and fully utilize the hardware resources of supercomputer multi-node and multi-GPU,this paper proposes RRTMG_LW multi-node multi-GPU acceleration algorithm based on MPI+CUDA Fortran hybrid programming paradigm.The experimental results show that RRTMG_LW achieves 78.12× acceleration on 16 K20 GPUs.This paper proposes a series of GPU acceleration algorithms for RRTMG_LW,which improves the computational efficiency of long-wave radiation physical processes,and realizes large-scale fast calculation of CAS-ESM,which lays a solid foundation for the research of heterogeneous computing algorithms in other physical processes. |