Font Size: a A A

Implementations Of Electromagnetic Field Integral Equation Algorithms On GPU/CPU Heterogeneous Platform

Posted on:2017-01-15Degree:DoctorType:Dissertation
Country:ChinaCandidate:X MuFull Text:PDF
GTID:1220330491463136Subject:Electromagnetic field and microwave technology
Abstract/Summary:PDF Full Text Request
The numerical analysis of electrically large objects has been a hot topic in Computational Electromagnetics. When the electrical size of an object is relatively large, an ordinary computer will be difficult to undertake such a task. In order to conquer this problem, parallel computing is introduced into the algorithms in Computational Electromagnetics. In early, the parallel algo-rithm is implemented by means of parallel programming on a multicore CPU with OpenMP or on a cluster of computers with MPI. In recent years, a new type of large scale parallel processor ——graphics processing unit (GPGPU or GPU) is introduced into parallel computing field, resulting in a leap in parallel computing efficiency and opening up a new research direction for Computational Electromagnetics. In this Ph.D. Dissertation, the implementations of some electromagnetic field integral equation based algorithms on GPU/CPU heterogeneous platform are investigated, and the major innovations are as follows:1. An optimized implementation of the multilevel fast multipole algorithm (MLFMA) on GPU/CPU heterogeneous platform is proposed. This implementation consists of four parts:1) For the near-field matrix filling, an optimized GPU/CPU cooperative computing scheme is designed;2) A highly efficient algorithm for sparse matrix-vector products is proposed, and its average efficiency is about 2.5 times as high as the commercial GPU algorithm library NVIDIA CUSPARSE;3) A warp-level parallel scheme for the aggregation/disaggregation on the finest level in the MLFMA is proposed to replace the thread level parallel scheme;4) A texture memory scheme, instead of the global memory, for the 2D local inter-polation/anterpolation in the aggregation/disaggregation on the coarser level in the MLFMA is proposed to optimize memory access, significantly improving the effi-ciency of the 2D local interpolation/anterpolation. Compared with the existing GPU-based algorithm in the literature, the proposed algo-rithm can increase the computational efficiency by 25%.2. An optimized implementation of the adaptive cross approximation (ACA) algorithm on multi GPU platform is proposed. This implementation consists of three parts:1) For the near-field matrix filling, the mixed precision computing method is adopted, both increasing the computational efficiency by 100% and making the computational error in a controllable range;2) For the matrix compression, the threadblock-level parallel scheme and the dynamic parallel technique are respectively applied to the finest level and to the coarser levels in the MLFMA;3) For the far-field matrix-vector products, a register-reusable scheme for single preci-sion and a double-buffer technology for double precision are proposed to enhance the performance, increasing the computational efficiency for matrix-vector prod-ucts by about 1.2 to 2 times compared with the commercial GPU algorithm library NVIDIA cublas.Compared with the four-core CPU parallel algorithm, the maximum speedup ratio of the matrix compression process is about 82, and that of computing a far-field matrix-vector product is about 30.3. An optimized implementation of the higher-order method of moments (HMoM) with an out-of-core LU solver is proposed. This implementation consists of three parts:1) A GPU oriented programming scheme with its optimization procedure is proposed by introducing both global and local auxiliary tables to reduce tedious and repetitive calculations;2) An overlapping grouping of quadrilateral patches is proposed such that all the sub-matrix blocks can be efficiently generated with the help of video memory and mem-ory without wasting any calculation;3) A GPU-based out-of-core LU decomposition algorithm is proposed and extended to GPU/CPU heterogeneous platform.Compared with the implementation in the literature, the speedup of the optimized imple-mentation here can acquire about 7 to 12.
Keywords/Search Tags:The method of moments (MoM), electromagnetic scattering, multilevel fast mul- tipole algorithm (MLFMA), adaptive cross approximation (ACA), higher-order method of mo- ments (HMoM), GPU, CUDA, parallel computing, OpenMP, MPI
PDF Full Text Request
Related items