Many-core Parallel Computing For Typical Implicit CFD Methods

Posted on:2014-07-18

Degree:Master

Type:Thesis

Country:China

Candidate:L Deng

Full Text:PDF

GTID:2180330479479102

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

As a powerful analytical tool, computational fluid dynamics(CFD) is widely applied in the fields of aerospace, weather, ship, weapon equipment etc. Due to the increase of computational requirement and memory requirement for high-order accurate,large-scale, real time CFD simulation, it is an inevitable trend to develop efficient large-scale parallel computing.Recently, with the rapid develop of the technology of many-core, heterogeneous many-core architecture are becoming the mainstream high-end supercomputer system architecture. Comparing with traditional homogeneous system, the combined features of general purpose computing, high performance and high performance/price ratio are what make heterogeneous many-core system suitable for scientific and engineering computing. However, because of the complex hardware and programming model, it is also difficult for developers to develop CFD applications.This article is based on the background of a finite volume real CFD application and implemented the many-core parallel computing and optimization of typical implicit CFD methods on Graphics Processing Unit(GPU) and Intel Many Integrated Core(MIC). The main contributions are listed as follows:(1) The architecture features and programming environments of this two mainstream multi-core processors are detailedly introduced, the performance optimization methods of each platform are concluded, the similarities and differences are compared and analyzed from both hardware and software aspects, and the learning curves of programming and optimization under each architecture are given according to my own experience.(2) Analyze the basic theory, algorithm procedure and the data dependence of typical implicit CFD methods in depth. Propose two GPU parallel porgramming methods, the grid point level parallel by 3D GPU threads block and the grid line level parallel by 2D GPU threads block, to implement the GPU parallel computing of these different computation procedures. Implement the GPU parallel computing of ADI and JACOBI methods, and test their performances by real cases of structural grid in different scales. The test result shows that they gain the speedups of 10.3 and 14.25 respectively, the JACOBI methods is proven to be much more suited for GPU parallel computing.(3) Indepth analyses of the OpenMP parallel performance of LU-SGS, ADI, and JACOBI are conducted on MIC platform by using the LIKWID performance analysis tool, and a micro-architecture hardware metric based optimize method is proposed, it is helpful in comprehensively understanding the cache and SIMD’s effect on applicationperformance. Take JACOBI method as an example, the optimized performance speedup of ideal demo differs to that of actual application. With the help of LIKWID performance analysis tool, the program’s runtime hardware metrics are collected and analyzed, and then a rational explanation of the performance speedup differences under various circumstances is given. The test result shows that for a single block example with 2 million grid points, the optimized JACOBI method gains the speedup of 17.54 on MIC versus single CPU core.

Keywords/Search Tags:

LU-SGS, ADI, JACOBI, GPU Parallelization, MIC Parallelization

PDF Full Text Request

Related items

1	Researches On Parallelization Methods Of LP^MLN Reasoning
2	Group Theory Based Data Dependence Model For Loop Parallelization
3	Parallelization Of Graph Algorithms
4	Research Of Hybrid Parallel System For Dynamic Programming Parallelization In MPP Environment
5	Research On Compression Of Large Collections Of Genomes And Its Parallelization Algorithm
6	Research And Application Of Parallelization Of Community Discovery Algorithm Based On Spark
7	Research On Parallelization Of Spatial Data Mining Clustering Algorithm Based On SPARK
8	Research On VCF Format Genomic Data Compression And Parallelization Based On Domestic Bigdata All-in-one Machine
9	Research And Application Of The Algorithm And Parallelization Scheme For Hybrid ETKE-4DVAR Data Assimilation
10	Parallelization Of Smoothed Particle Hydrodynamics Method And Its Application