Font Size: a A A

Research On Performance Optimization And Parallel Technology For Compressible NS Equation Software

Posted on:2019-08-14Degree:MasterType:Thesis
Country:ChinaCandidate:S DongFull Text:PDF
GTID:2370330623450595Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Computational fluid dynamics(CFD)is an important discipline that synthesizes fluid mechanics,mathematics and computer science.It is widely used in fluid engineering such as aviation,automobile,respiration and blood flow,and chemical engineering.With the improvement of computational accuracy and the increasement of computational scale,the computational complexity of CFD applications has also dramatically increased.The performance optimization and large-scale parallel computing of CFD applications on modern high-performance computers has become a new important issue.CNS(Compressible Navier-Stokes)is a CFD application for solving compressible Navier-Stokes equations important in fluid mechanics.It has good application background in aerospace numerical simulation of aeronautics and aerodynamics.parctically,CNS needs to be optimized to play a better role.Based on the high-performance multi-core and many-core platforms,this paper realizes the performance optimization of CNS programs and the related parallel technologies.The work is divided into three parts:(1)On general-purpose CPU platform,the performance of the serial program is improved.For the fact that memory access is intensive,cache hit rate is not high,the hot subroutine can not be vectorized automatically by compiler,the program is optimized from compile option,memory access,numerical calculation and so on.Pragram is tested in Ivy bridge processor,the performance increases by1.49 times.For program with fortran90 can not be vectorized automatically,it is necessary to rewrite it in C language firstly.The rewrite must be achieved with overhead as small as possible.Based on Ivy bridge processor,the program is tested at two grid sizes.The speedup radios are both 1.43 while overall performance of the program increases by 10%.(2)In order to make the computing performance of multi-core processors better,multi-threading parallelism within the nodes of the CNS program is realized by using the OpenMP programming model on a computing node comprising two 12-core CPUs.Firstly,for hot cycles is scattered,this paper iteratively identify and achieve their loop-level parallelism.Then in order to decrease the overhead of creating parallel region,let the scope of the parallel region includes the entire rk4 subroutine,the proportion of parallel execution time increases from 62.5 % to 84.4%;Then because of NUMA architecture's feature that CPU cores access local memory more faster,system functions are used to achieve the binding of threads and processor core while SPMD programming method are used to achieve the binding of threads and data blocks.After this two-lever binding,speedup with 24 threads increases from the original 5.6 to 6.5.(3)To take advantage of the accelerated performance of the MIC coprocessor,we realize the heterogeneous transplantation of CPU and MIC for CNS program in a computing node consisting of 2 CPUs and 2 MIC cards.As the fact that performance is not ideal,we optimize it from the aspects of data transmission and load balancing,speedup reaches 3.6 compared to before optimization.
Keywords/Search Tags:C/Fortran mixed programming, intrinsic vectorization, SPMD programming, CPU+MIC heterogeneous transplantation
PDF Full Text Request
Related items