Font Size: a A A

Research On Parallel Computing For Multi-block Structured CFD On Multi-/Many-Core Processors

Posted on:2017-02-07Degree:MasterType:Thesis
Country:ChinaCandidate:N B GuoFull Text:PDF
GTID:2370330569498583Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The development of computational fluid dynamics(CFD)is closely related to the development of computer technology.The emergence of new multi-/many-core processors presents opportunities and challenges for large-scale parallel applications including CFD.The traditional parallel algorithms have been difficult to adapt to the new parallelism of multi-/many-core architecture,and their parallel scalability becomes the bottleneck.How to combine the characteristics of practical CFD application with the new multi-/many-core parallel architecture features and design efficient and scalable parallel algorithm has become one of the important contents of CFD parallel computing.In this paper,we focus on the current multi-/many-core processor platform,and study the CFD scalable parallel algorithm of complex high-order multi-zone structure grid.The main work includes:1)We propose a scalable parallel algorithm for LU-SGS on multi-/many-core processors.The implicit LU-SGS is a popular method in engineering CFD applications,but the data dependency in LU-SGS is strong and is difficult to parallel.This paper proposes a two-level pipelined parallel algorithm,TL-Pipeline(Two-level Pipeline),to reduce the bottleneck of the traditional shared-memory LU-SGS parallel algorithm on the new many-core platform,which effectively reduces the pipeline overhead and improves the load balancing among threads.The algorithm is implemented in the in-house complex high-order CFD code using nested OpenMP.In the dual-socket Intel Xeon E5-2692 v2 multi-core processor node(24 threads)and the Xeon Phi 31S1P57 many-core processor(57 cores,Up to 228 threads),the TL-Pipeline algorithm improves the performance as much as 1.42 and 7.80 compared to the traditional pipeline parallel algorithm,respectively.2)On this basis,a block-based two-stage pipeline LU-SGS parallel algorithm BTL-Pipeline(Block-based TL-Pipeline)is proposed to fully exploit the parallelism between multi-block grid computing.For LU-SGS,the parallel speedups of the BTL-Pipeline on the above-mentioned Intel Xeon and Xeon Phi platform compared to the TL-Pipeline are up to 2.06 times and 7.42 times,respectively.Further applied the block-based parallel idea to the right hand side(including viscous and convective terms)OpenMP parallel computing of the CFD code,and compared to the traditional OpenMP parallelism in a single block,the parallel speedups performed on the aforementioned Intel Xeon and Xeon Phi platforms are up to 1.49 and 2.06 times,respectively.3)Current multi-/many-core processors generally have 256-bit or even 512-bit wide vector components,and for complex CFD computational kernels,the automatic vectorization of compilers is usually less efficient.In the paper,the fifth-order interpolation stencil,WCNS,in the high-order structural grid CFD software is selected to realize efficient vectorization parallelization based on intrinsic,which greatly improves the vector parallelization efficiency.For double-precision computation,the parallel efficiency on the 256-bit Intel Xeon E5-2692 v2 processor has been raised 2.01 times,and the parallel efficiency on the 512-bit Xeon Phi 31S1P57 processor has been raised 7.60 times.
Keywords/Search Tags:multi-/many-core processors, structured grid CFD, LU-SGS, parallel computation, vectorization
PDF Full Text Request
Related items