Font Size: a A A

Research On Efficient Implicit Solver And Many-core Parallel Optimization For Structured High-order CFD

Posted on:2018-02-14Degree:DoctorType:Dissertation
Country:ChinaCandidate:D L LiFull Text:PDF
GTID:1360330569498478Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
High-order accuracy schemes,which have excellent features of low dispersion,low dissipation,high resolution,precise results and fine flow field structures,are significant in numerical simulations of complex flow problems such as aeroacoustics,turbulence,transition,etc.The high-order simulation of multi-scale physics,complex geometry and enormous meshes problem is extremely computationally expensive and time consuming,thus presses for developing appropriate efficient implicit solvers and related parallel computing techniques.Traditiional implicit solvers developed from low-order CFD meet with stencil mismatch problem in high-order CFD applications,thus lead to low convergence rate and poor robustness.Jacobian-free Newton-Krylov(JFNK)skillfully combines the Newton nonlinear method with superlinear convergence property and the Krylov subspace method for solving large-scale sparse linear equations,traditional implicit linear solvers can be used as preconditioners to improve the convergence rate of JFNK.JFNK approximately evaluates the matrix-vector multipilcation by finite difference quotient(i.e.matrix free technique)to avoid the direct compute and store of low-order Jacobian matrix,thus shows particular attraction for high-order CFD.However,preconditioned JFNK solver is much more complex than traditional implicit solvers.High-efficiency preconditioned JFNK solver relies on specific implementation,approperate preconditioner and configuration of anumerous algorithmic parameters,thus restricts its application in high-order CFD.The emerging of many-core processors and the employing of wide vector processing units(VPU)promote the performance of supercomoputers continuously.But on the other hand,abundant parallel resources lead to fine granularity of parallelism and diversified dimensions of optimization,hence pose tough challenges for parallel optimizing of high-order CFD applications.On many-core processors with hundreds threads,the parallel computing of LU-SGS implicit solvers(preconditioners)with inherent strong datadependence suffers severe scalability crisis.Meanwhile we need to utilize the SIMD parallel potential of VPU to improve the computational efficiency of applications.In this paper,based on the Tianhe-2 many-core supercomputer and the domestic high-order accuracy weighted compact nonlinear schemes(WCNS),we study the efficient implicit solving and parallel computing of structured high-order CFD.We further employ it in the simulation of practical compressible aerodynamics.The main contributions are as follows.Aiming at structured high-order CFD applications,we deduce the numerical model of JFNK for finite difference Navier-Stokes equations,and propose a preconditioned JFNK nonlinear solving algorithm for high-order CFD.The algorithm combines inexact Newton method with restarted general minumum residual(R-GMRES)method,and uses various linear solvers as preconditioners.For the WCNS based in-house high-order CFD software,we present the implementation flowchart of JFNK,in which we reuse the original key modules including linear solvers and right-hand-side kernels as far as possibble.We validate the JFNK solver and evaluate its convergence rate using compressible viscous steady flow simulations.The preconditioned JFNK solver with baseline configurations outperforms LU-SGS by factors of 2ื to 4.3ื in convergence rate,and retains excellent robustness.For the efficient solving of high-order CFD,we systematacially and extensively evaluate the impacts of various preconditioners(LU-SGS,PR-SGS,Jacobi iteration)and algorithmic parameters(CFL number,restart steps,relative and absolute residual thresholds)on the convergence and robustness of JFNK to guide the configuration in realistic CFD applications.We also propose a new equivalent timesteps(ETs)performance metric,which accumulates the nonlinear function computing and preconditing operations into a equivalent linear solver timesteps,to contrast the convergence rate objectively and fairly.Through the above tunings,the convergence rate of preconditioned JFNK solver can be enhanced by an order of magnitude over traditional LU-SGS linear solver.LU-SGS is a popular efficient implicit solver(preconditioner)with inherent strong data dependence feature,which leads to severely scalability drop of pipelined parallel(P_LU-SGS)algorithm on emerging many-core processors.We propose a novel two level pipelined parallel LU-SGS(TLP_LU-SGS)algorithm to fully exploit the parallelism of3 D LU-SGS,and establish an unified performance evaluation model.Theratical analysis and test results show that TLP_LU-SGS algorithm can effectively improve the performance of pipeline,and gains speedups of up to 1.32ื and 3.7ื over tradituional P_LUSGS on Xeon CPUs and Xeon Phi,respectively.Considering the multi-blocked features of structured CFD,we further propose a multi-block parallel based BTLP_LU-SGS algorithm and achieves a performance boosts of nearly 2ื and 3ื over TLP_LU-SGS for multi-block cases on Xeon CPUs and Xeon Phi,respectively.For the wide VPU capability,we vectorize and optimize the computational hot-spots in the high-order CFD program.We optimize the WCNS-E-5 nonlinear interpolation kernel(most time consuming one)through loop fusion technique,data layout optimization and vector Intrinsics primitive rewriting,and achieves SIMD speedups of 2ื and 4.5ืon Xeon CPUs and Xeon Phi,respectively.For implicit LU-SGS algorithm,we propose a novel multi-thread and SIMD hybrid parallel(PSIMD_LU-SGS)algorithm.PSIMD_LUSGS uses the fine-grained data parallel feature of hyper-plane parallel LU-SGS algorithm to enable the SIMD acceleration of VPU and obtains 2.3ื SIMD speeduos over P_LU-SGS on Xeon CPUs.
Keywords/Search Tags:High-order CFD, JFNK, LU-SGS, Parallel computing, Many-core processor, Vectorization
PDF Full Text Request
Related items