Font Size: a A A

Parallel Design And Optimization Of A Galaxy Group Finding Algorithm On SGI Vs. Distributed-memory Cluster

Posted on:2018-11-19Degree:MasterType:Thesis
Country:ChinaCandidate:Y M SiFull Text:PDF
GTID:2370330596989159Subject:Computer technology
Abstract/Summary:PDF Full Text Request
As one of the most important research topics in astrophysics,galaxy group finding offers great help in learning the formation and evolution of galaxy groups and large-scale structure in the universe.Halo-based Galaxy Group Finder(HGGF)is an effective algorithm based on dark matter halo that accomplishes the task of galaxy group finding according to galaxy coordinates,redshift and mass etc.,and provides help in these researches.However,current pure OpenMP implementation of the algorithm is limited by the programming model,therefore its application in large-scale group finding problems is undermined since it can only utilize resource of a single underlying compute node.One of the possible solutions is to solve such problems using Coherent Shared Memory(CSM)machines which has large CPU and memory resource,however its daunting price might be a huge concern.Meanwhile,its effectiveness for such algorithm should be tested.Another solution is to use resources from multiple cluster nodes to reduce execution time while making large-size problem solving possible.For the latter approach,it is essential to redesign and implement the algorithm,and one of the major hurdles for such an attempt is remote memory access due to semi-random galaxy accesses in the algorithm which damages the performance in multi-node environment.To tackle such a problem,we paralleled the algorithm with adjacent galaxy list design and used Unified Parallel C(UPC)to implement it.2.25,2.78 and 5.07 times speedup for the kernel were achieved with 4,8 and 16 nodes respectively.Moreover,the memory requirement on each node was also reduced significantly.Experiments of OpenMP version of the algorithm on SGI UV 2000 show that due to the nature of the program and the features of NUMA architecture,programs with random memory access behavior like HGGF may not readily benefit from the large number of threads and shared memory provided by such CSM machines.Two-level parallel design that takes advantage of locality principle on distributed memory clusters solves the HGGF problem more efficiently.
Keywords/Search Tags:High performance computing, Galaxy group finding, Parallel computing, UPC, OpenMP
PDF Full Text Request
Related items