Font Size: a A A

Research On Performance Optimizations Of Stencil Computations On Domestic Heterogeneous Many-core Processor

Posted on:2022-06-10Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhuFull Text:PDF
GTID:2518306521457574Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Stencil computation is a common computing pattern in scientific computing and engineering applications.Its computing and memory access costs increase linearly with the increase in scale,and it is suitable for parallel implementation on high-performance computers.Sunway Taihulight supercomputer independently developed and designed by China is equipped with the domestic SW26010 many-core processor and is the world's first supercomputer with performance of more than 100Pflop/s.In order to give full play to the role of domestic supercomputers in the acceleration of scientific applications,performance optimization for Stencil computation is very important.However,the limited memory access bandwidth of the domestic heterogeneous manycore processors has brought greater challenges to Stencil computation optimization,and a lot of manual tuning work is required in process of optimization.In order to realize the in-depth optimization and rapid deployment of Stencil computation application on domestic heterogeneous many-core processors,this thesis conducts research on the performance optimization technology of Stencil computation on domestic heterogeneous many-core processors.The main work and contributions are as follows:1.Design and implementation of the parallel optimization strategy for Stencil computation on domestic heterogeneous many-core processors.According to the architecture and storage characteristics of domestic heterogeneous many-core processors,the main performance bottlenecks in the parallel optimization process of Stencil computation are analyzed.This thesis designed a parallel optimization strategy which consists of data partitioning that adapted to LDM,overlapped tiling,double buffering optimization,temporal tiling,etc.This strategy effectively solves the problems of limited LDM,memory access bandwidth limitations,and inefficient data reuse.The 2D-5P,2D-9P,3D-7P,and 3D-27 P Stencil examples were selected to perform optimization experiments on the SW26010 processor single core group,and the highest speedup ratio can reach 132.05.2.A Stencil computation analysis performance model for domestic heterogeneous many-core processors is proposed.The algorithm characteristics and memory access mode of parallel Stencil computation program is analyzed,and the relationship between key performance parameters and program running time is quantified.Furthermore,the overlap of computation time and DMA time in double-buffering is analyzed to establish the performance model.An analytical model for threedimensional Stencil computation was established and verified on 3D-7P and 3D-27 P applications.The average error of the proposed performance model was about 10.97% on SW26010 heterogenous many-core processor.3.An adaptive tiling size algorithm for Stencil computation on domestic heterogeneous manycore processors is proposed.Based on the performance bottleneck analysis,the spatial and temporal tiling size parameters are adjusted by measuring the cost of redundant data transmission and computation in overlapped tiling method to obtain theoretical optimal tiling size.Combining the performance model with the tiling size algorithm,an adaptive tiling size algorithm for domestic heterogeneous many-core processors is proposed.The effectiveness of the algorithm was verified by 3D-7P and 3D-27 P stencil computation applications.
Keywords/Search Tags:Stencil computation, domestic heterogenous many-core processor, overlapped tiling, performance model, tiling size
PDF Full Text Request
Related items