Implementations Of Electromagnetic Field Integral Equation Algorithms On GPU/CPU Heterogeneous Platform

Posted on:2017-01-15

Degree:Doctor

Type:Dissertation

Country:China

Candidate:X Mu

Full Text:PDF

GTID:1220330491463136

Subject:Electromagnetic field and microwave technology

Abstract/Summary:

PDF Full Text Request

The numerical analysis of electrically large objects has been a hot topic in Computational Electromagnetics. When the electrical size of an object is relatively large, an ordinary computer will be difficult to undertake such a task. In order to conquer this problem, parallel computing is introduced into the algorithms in Computational Electromagnetics. In early, the parallel algo-rithm is implemented by means of parallel programming on a multicore CPU with OpenMP or on a cluster of computers with MPI. In recent years, a new type of large scale parallel processor ——graphics processing unit (GPGPU or GPU) is introduced into parallel computing field, resulting in a leap in parallel computing efficiency and opening up a new research direction for Computational Electromagnetics. In this Ph.D. Dissertation, the implementations of some electromagnetic field integral equation based algorithms on GPU/CPU heterogeneous platform are investigated, and the major innovations are as follows:1. An optimized implementation of the multilevel fast multipole algorithm (MLFMA) on GPU/CPU heterogeneous platform is proposed. This implementation consists of four parts:1) For the near-field matrix filling, an optimized GPU/CPU cooperative computing scheme is designed;2) A highly efficient algorithm for sparse matrix-vector products is proposed, and its average efficiency is about 2.5 times as high as the commercial GPU algorithm library NVIDIA CUSPARSE;3) A warp-level parallel scheme for the aggregation/disaggregation on the finest level in the MLFMA is proposed to replace the thread level parallel scheme;4) A texture memory scheme, instead of the global memory, for the 2D local inter-polation/anterpolation in the aggregation/disaggregation on the coarser level in the MLFMA is proposed to optimize memory access, significantly improving the effi-ciency of the 2D local interpolation/anterpolation. Compared with the existing GPU-based algorithm in the literature, the proposed algo-rithm can increase the computational efficiency by 25%.2. An optimized implementation of the adaptive cross approximation (ACA) algorithm on multi GPU platform is proposed. This implementation consists of three parts:1) For the near-field matrix filling, the mixed precision computing method is adopted, both increasing the computational efficiency by 100% and making the computational error in a controllable range;2) For the matrix compression, the threadblock-level parallel scheme and the dynamic parallel technique are respectively applied to the finest level and to the coarser levels in the MLFMA;3) For the far-field matrix-vector products, a register-reusable scheme for single preci-sion and a double-buffer technology for double precision are proposed to enhance the performance, increasing the computational efficiency for matrix-vector prod-ucts by about 1.2 to 2 times compared with the commercial GPU algorithm library NVIDIA cublas.Compared with the four-core CPU parallel algorithm, the maximum speedup ratio of the matrix compression process is about 82, and that of computing a far-field matrix-vector product is about 30.3. An optimized implementation of the higher-order method of moments (HMoM) with an out-of-core LU solver is proposed. This implementation consists of three parts:1) A GPU oriented programming scheme with its optimization procedure is proposed by introducing both global and local auxiliary tables to reduce tedious and repetitive calculations;2) An overlapping grouping of quadrilateral patches is proposed such that all the sub-matrix blocks can be efficiently generated with the help of video memory and mem-ory without wasting any calculation;3) A GPU-based out-of-core LU decomposition algorithm is proposed and extended to GPU/CPU heterogeneous platform.Compared with the implementation in the literature, the speedup of the optimized imple-mentation here can acquire about 7 to 12.

Keywords/Search Tags:

The method of moments (MoM), electromagnetic scattering, multilevel fast mul- tipole algorithm (MLFMA), adaptive cross approximation (ACA), higher-order method of mo- ments (HMoM), GPU, CUDA, parallel computing, OpenMP, MPI

PDF Full Text Request

Related items

1	Research On Higher-Order Method Of Moments Based On Bézier Quadrangular Patches
2	Fast And Efficient Algorithm Based On The Multilevel Fast Multipole Method
3	Improved Adaptive Cross Approximation Algorithm For Fast Analysis Of Electromagnetic Scattering / Radiation Problem
4	Higher-Order Hierarchical Vector Basis Functions On B�zier Curved Triangular Surface And Their Applications In The Method Of Moments
5	Based On Mpi - Tv University Target Electromagnetic Scattering Of Openmp Hybrid Parallel Computing Research
6	Research On Electromagnetic Modeling Technology For Scattering Center Imaging
7	Surface Integral Equation Combined With The Parallel Mlfma Analysis Of A Composite Target Of The Conductor Medium Electromagnetic Scattering Problems
8	Research On Higher Order MoM And Fast Algorithms For Electromagnetic Scattering And Radiation From Metallic And Dielectric Targets
9	Based On The Electromagnetic Scattering Analysis Of Laminated Basis Function Method Of Moments
10	Research On RCS Computing Method Based On Parallel Multilevel Fast Multipole Algorithm