Font Size: a A A

Accelerating discontinuous Galerkin method and finite difference method by using multiple GPUs with CUDA

Posted on:2016-11-20Degree:Ph.DType:Dissertation
University:University of WyomingCandidate:Mu, DaweiFull Text:PDF
GTID:1470390017477363Subject:Geophysics
Abstract/Summary:
Accurate and efficient computer simulations of seismic wave propagation in realistic three-dimensional geological media are becoming increasingly important in seismology for improving our understanding of the earthquake rupture process that generates seismic waves and the geological medium through which seismic waves propagate. However, the accurate and computationally efficient numerical solution of the three-dimensional (visco)elastic seismic wave equation is still a very challenging task, especially when the material properties are complex and the modeling geometry, such as surface topography and subsurface fault structures, is highly irregular.;We have successfully ported two different numerical methods for solving the three-dimensional elastic seismic wave equation from CPU platform to GPU platform. The first one is arbitrary high-order discontinuous Galerkin (ADER-DG) method which was designed for solving the three-dimensional elastic seismic wave equation on unstructured tetrahedral meshes. This ADER-DG implementation obtained a speedup factor of about 24.3 for the single-precision version of our GPU code and a speedup factor of about 12.8 for the double-precision version of our GPU code when compared with the serial CPU code running on one Intel Xeon W5880 core. By implementing the MPI technique and other optimization scheme, we further improved our ADER-DG code with parallelism capability which obtained a speedup factor of about 28.3 for the single-precision version of our codes and a speedup factor of about 14.9 for the double-precision version. To effectively overlap inter-process communication with computation, we separate the elements on each sub-domain into inner and outer elements and complete the computation on outer elements and fill the MPI buffer first. While the MPI messages travel across the network, the GPU performs computation on inner elements and all other calculations that do not use information of outer elements from neighboring sub-domains. A significant portion of the speedup also comes from a customized matrix-matrix multiplication kernel, which is used extensively throughout our program. Preliminary performance analysis on our parallel GPU codes shows favorable strong and weak scalabilities. The second numerical method we ported is fourth order finite difference method. Within this implementation, we utilized the staggered grid, dual layer mesh grid, classical Perfect Match Layer (PML) and many GPU optimize technique to enhance the efficiency of our code. Compared with the double precision CPU code, our finite-difference implementation obtained a speedup factor of about 62 for the single-precision version of our GPU code and a speedup factor of about 31 for the double-precision version of our GPU code when compared with the serial CPU code running on one Intel Xeon W5880 core.
Keywords/Search Tags:GPU, CPU code, Seismic wave, Method, Double-precision version, Speedup factor, Three-dimensional
Related items