| The design,operation,and safety analysis of advanced nuclear reactors require an accurate understanding of physical behavior and laws in the reactor.Advanced nuclear reactors are characterized by a high degree of heterogeneity and increasing complexity of geometry.It requires higher-fidelity reactor core simulation.Since the method of characteristics,MOC,can easily treat arbitrary geometries,it has been widely used in high-fidelity neutron transport calculations.However,the computational burden inherent in MOC whole-core simulation is quite large and it is time-consuming.An advantage of MOC is that tremendous parallel rays provide potential parallelism,which enables utilizing large-scale parallelization to improve the efficiency.In recent years,with the development of GPU,it possesses powerful floating-point operations and parallel computing capabilities.The CPU/GPU heterogeneous parallelization plays a significant role and becomes a new trend in the scientific computing community.The focus of this thesis is on the development of a highly scalable and efficient parallel algorithm on CPU/GPU architecture for performing the whole-core high-fidelity neutron transport calculation with MOC.Especially,this work mainly focuses on the design of parallel algorithms,performance optimization,heterogeneous parallelization,CPU/GPU concurrent computation,scaling analysis,and load balancing.The main contributions are listed as follows:(1)The GPU-accelerated parallel algorithm is proposed for the 2D MOC solution of the neutron transport equation.The algorithm is parallelized by characteristic rays,energy groups,and polar angles.Correspondingly,three parallel schemes are established and analyzed in efficiency and performance.Then it is proved that the ray-energy-group-level parallel scheme is suitable for the GPU-accelerated parallel algorithm.(2)Once the parallel algorithm is established,a performance model for the particular implementation is derived.This model can effectively characterize the performance of GPU-accelerated parallel algorithm,including the efficiency of floating-point operations,efficiency of memory access,the cache hit rate,and so on.Then the bottleneck of the algorithm can be recognized for further optimization.The performance analysis shows that the performance of the GPU-accelerated 2D MOC parallel algorithm is restricted by the memory access latency and intensive instruction operation.Then,corresponding optimizations are focused on improving the efficiency of memory access and the throughput of instructions.The performance of the GPU transport sweep algorithm is eventually improved by a factor of 1.6 based on these optimizations.The GPU shows powerful computing capacity especially with the single-precision operation,and the 1080 Ti GPU achieves 100 x speedup compared to the runtime on a single core of i9-7900 CPU.As a result,based on the performance analysis and optimizations,the parallelization of 2D MOC neutron transport calculation is efficiently implemented on GPU,which significantly improves the efficiency of the 2D high-fidelity simulation.(3)The CPU/GPU heterogeneous parallelization is implemented for the 2D MOC neutron transport calculation.Firstly,the heterogeneous parallel algorithm is established by employing the spatial domain decomposition and the ray-energy-group-level parallel scheme.Then,a parallel efficiency analysis model is proposed to quantify the strong scaling performance of the heterogeneous parallel algorithm.Finally,based on the results of strong scaling analysis,the parallel efficiency can be improved by overlapping the MPI communication with computation and applying the asynchronous data-copy between GPU and CPU.Numerical results demonstrate that the parallel efficiency analysis model can accurately recognize the main impacts of strong scaling efficiency of heterogeneous parallel algorithm.The related optimizations can improve the strong scaling efficiency from 87% to95%.The proposed algorithms and models can adequately utilize the computing resources of multiple GPUs,and the corresponding optimization can effectively reduce the impact of data communication overhead on the scalability of the heterogeneous parallel algorithm.(4)To efficiently exploit the computing power of the heterogeneous system,a CPU/GPU concurrent computing technique is implemented.This technique allows multi-core CPUs and multiple GPUs to perform the MOC neutron transport calculation concurrently.A dynamic workload assignment model is proposed to dynamically detect and concurrently utilize all computing resources.The numerical results show that the workload assignment model can accurately predict the optimal workload assignment to ensure the load balance in CPU/GPU concurrent computing.And over 11% improvement is achieved in CPU/GPU concurrent computing compared against the heterogeneous parallelization.Since the computing capability of GPU is much more powerful than CPU,the performance improvement of CPU/GPU concurrent computing is not significant.(5)The MOC-EX method for solving the 3D neutron transport problem is derived using the diamond difference approximation,and the related CPU/GPU heterogeneous parallel algorithm is then implemented.In the MOC-EX method,a series of planar MOC solutions are performed for all axial layers.Hence,the algorithms and models based on the 2D MOC calculation can be extended readily to the 3D MOC-EX calculation.Besides,the whole-core neutron transport calculation is performed within the framework of the coarse mesh finite difference(CMFD)acceleration.Afterward,the parallelization of CMFD calculation on GPU is studied in this work.Especially,the linear system under the CMFD framework is solved by successive over-relaxation(SOR)method,and the parallelization of SOR is performed using the red-black ordering strategy.(6)The advanced lattice physics code based on heterogeneous architecture(ALPHA)is developed based on the parallel algorithms,models,and optimization strategies proposed in this thesis.Numerical results are shown for a range of benchmark problems typically solved by neutron transport codes.The 3D heterogeneous parallel algorithm shows excellent accuracy and stability.Besides,the GPU shows tremendous computing power and the 1080 Ti GPU provides about 40 x and 15 x speedup for the acceleration of the MOC-EX and CMFD calculation compared to the single-core operation of the i9-7900 CPU.The computing time of the TAKEDA benchmark problem is reduced from 6.5 h on CPU to 11 min on GPU,and 34 x speedup is observed.For the PWR octant-core problem with 7-group macroscopic cross-sections,about 85% strong scaling efficiency is observed when 10 heterogeneous nodes are involved in the simulation.The computing performance of heterogeneous parallelization is approximately 10 times faster than the CPU-based parallelization under the simulation of the PWR octant-core problem with 47-group macroscopic cross-sections.In this thesis,an effective and stable heterogeneous parallel algorithm is studied for solving the whole-core high-fidelity neutron transport problems.The heterogeneous parallelization provides a significant improvement for computing efficiency while the desired accuracy is attained.The study in this thesis promotes the feasibility of performing high-fidelity neutron transport calculations for practical reactor simulations. |