Parallel Finite Difference Method Based On Heterogeneous Multicore Processors

Posted on:2021-03-22

Degree:Master

Type:Thesis

Country:China

Candidate:H B Chen

Full Text:PDF

GTID:2370330611973241

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

The finite difference algorithm is always the core algorithm in the process of partial differential solution."Sunway Taihulight" supercomputing system,is the world’s first supercomputer system with performance more than 100Pflop/s,the internal integration of 40960 SW26010 heterogeneous multi-core processors.SW26010 heterogeneous multi-core processor has a unique architecture system.At present,there is no optimization scheme that can fully play the computing performance of SW26010 unique system architecture for the calculation of finite difference algorithm.In order to improve the efficiency of the finite difference method,this paper studies the problem that the refrigerator efficiency is too low based on the "Sunway Taihulight" supercomputer system,aiming at the problem that the refrigerator efficiency is too low based on the finite difference method in the seismic wave forward modeling and the general earth system model.According to the architecture characteristics of SW26010 heterogeneous processor,the main performance bottleneck of finite-difference algorithm is analyzed.The multi-stage parallel optimization based on the finite difference method of "Sunway Taihulight" supercomputing system is studied.Directed against MPI message-passing parallel process efficiency is low,the processor bandwidth inefficient communication,mass data processing LDM space cannot meet the demand of computing problems,studied the MPI,Sunway Athread,SIMD parallel methods,such as vectorization,design the longitudinal data partitioning,chain communication,2.5 D lines,bundling communications,such as asynchronous communication multilevel heterogeneous parallel optimization strategy.The main optimization work is as follows:(1)Two-stage parallel optimization scheme of staggered grid format by finite difference seismic wave method.Aiming at the problem of MPI message delivery time in first-level parallelism,the data distribution scheme is redivided,which effectively reduces The Times of message delivery and improves the efficiency of first-level parallelism.At the same time,the first-level parallel scheme presents the problem that the memory footprint is too large to calculate the large-scale model,which is effectively alleviated by the second-level parallel strategy.In second-level parallelism,due to the limitation of the bandwidth of the processor accessing main memory,the computing core cannot give full play to its efficient computing performance due to the delay of data access.Therefore,data is loaded from main memory into the LDM through the Sunway thread library using DMA communication.At the same time,in order to make DMA play its maximum row power,this paper designs the strategy of chain reading data.However,when the data is increased to a 3d LDM storage space,it is difficult to meet the computational requirements.This paper proposes a 2.5-D pipeline method to relieve the pressure of data storage,and at the same time,the 2.5-D pipeline facilitates the implementation of asynchronous communication scheme.In seismic acoustic forward modeling,128 process 8192 thread was used to conduct multi-stage heterogeneous parallel performance test,and the final acceleration effect was 1250.97 times.(2)multi-stage parallel optimization scheme for finite-difference correlation functions in CESM.In the previous optimization strategy,vectorized parallelism was added to improve the parallelism efficiency.This paper studies the internal register channel of the processor and designs the bundle communication policy,which alleviates the conflict between the communication priority policy and the storage priority policy.Vshff data replacement is studied in vectorization parallel to reduce the consumption of vector data encapsulation.The parallel performance test of two functions with finite difference as the core calculation in a single core group has achieved 9.9 times and 21.2 times respectively.Based on the above two optimization strategies,this paper studies the parallel optimization of the intensive algorithm represented by the finite difference method on the "Sunway Taihulight" supercomputer system.The optimization proposed in the test work effectively alleviates the bandwidth bottleneck caused by the hardware design,and the optimal acceleration effect is achieved,which lays a foundation for the future transplantation of other algorithms on "Sunway Taihulight" supercomputer system.

Keywords/Search Tags:

High performance computing, Heterogeneous multi-core processor, Multi-stage heterogeneous parallel, Finite difference method

PDF Full Text Request

Related items

1	Research On Global Sequence Alignment For Intel Multi-core And Many-core Platforms
2	Research On Heterogeneous Multi-Core Processor For Cell Image Processing
3	Parallel Collaborative Algorithm For Large-Scale LBM Multiphase Flow On Heterogeneous Many-Core Platform
4	Research And Implementation Of Parallel Computer Architecture For Graph Search
5	Research On OpenMP 4.0 Based Heterogeneous Parallel Computing Techniques For CFD Applications
6	Research On The CPU/GPU Heterogeneous Parallel Algorithm For The Method Of Characteristics Solution Of Whole-Core Neutron Transport Calculation
7	Implementation And Optimization Of Neural Network-Based Quantum Many-Body Simulation On Sunway Supercomputing Platform
8	Cosmological N-body Simulation On A Many-core Architecture
9	Porting And Optimization Of OpenFOAM For Multi-heterogeneous Platforms
10	Parallel Optimization Of LBM Algorithm Based On CCPU＿GPU Heterogeneous System