Font Size: a A A

Cosmological N-body Simulation On A Many-core Architecture

Posted on:2021-04-26Degree:MasterType:Thesis
Country:ChinaCandidate:X LiuFull Text:PDF
GTID:2370330611973244Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Cosmological simulations have been essential for astronomers to study the formation of non-linear structures and hypotheses of dark matter,dark energy,etc.The commonly used dark matter collision free particle system is a classical N-body problem simulation.High precision simulations include hundreds of billions or even trillions of particles,thus demanding massive computational power and highly efficient algorithms.Cosmological N-body simulation has always been an important branch of high-performance computing field.Foreign related research teams have won the Gordon Bell Prize many times by virtue of the large-scale cosmological N-body simulation project.Sunway TaihuLight,China's independently designed and developed,is the first supercomputer offering a peak performance over 100 PFlops.However,there has not been a large-scale cosmological simulation on " Sunway TaihuLight".After a thorough study of a cosmological N-body simulation software PHo ToNs,which developed by the National Astronomical Observatory of the Chinese Academy of Sciences,several performance optimization schemes are proposed for the unique hardware structure of the domestic multi-core processor sw26010 and redesigns the calculation module of the force between particles in the software to simulate the evolution of the universe.In this paper,a software SwPHoToNs for N-body simulation is implemented,which can give full play to the structural advantages of the domestic supercomputer “Sunway TaihuLight”.Using SwPHoToNs,we manage to conduct cosmological simulations which contain up to 640 billion particles on 5,200,000 cores,obtaining a sustained performance of 29.44 PFlops with a weak-scaling parallel efficiency of 84.6% and a computational efficiency of 48.3%.The main research work of this paper is as follows:(1)For SW26010 processor memory limit and the Tree-PM algorithm used by PHoToNs,an MPI communication strategy is designed in this paper to improve the balance of communication calculation between processes and storage pressure,so as to achieve higher parallel efficiency and simulation scale.And based on this strategy,we use the parallelism between the master and slave cores to design two targeted pipeline parallel modes for the gravity calculation of particles inside the process and between particles in different processes.(2)For the unique memory structure of the slave core array of SW processor,we apply for redundant array to eliminate the gravitational double computation caused by the double tree traversal and force interaction.By using this redundant array and the sorting scheme designed for the generation of computing tasks on MPE,the read-write conflict of parallel computing is solved.(3)In order to give full play to the computing performance of SW processor,we use the LDM and DMA to reduce the time cost of data reading,and design pipeline parallel modes for DMA and compute.We have carried on the manual vectorization after the cycle expansion to the core calculation function.Due to the calculation method of gravitational long short splitting,there are two very time-consuming transcendental functions appear,exp function and erf function.By polynomial fitting of the two functions,we design the vectorization version of the two functions to complete the whole vectorization of the core calculation code.
Keywords/Search Tags:Cosmological N-body simulation, SW26010 multi-core processor, Parallel computing, Performance optimization
PDF Full Text Request
Related items