Font Size: a A A

Parallelization And Optimization Of The Finite Difference Method Cardiac Model Using Many Integrated Cores(MIC) And Multi-Core CPU

Posted on:2014-02-04Degree:MasterType:Thesis
Country:ChinaCandidate:J YangFull Text:PDF
GTID:2284330479979307Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As one of the main diseases of death, sudden cardiac death is a major health problem and is still a threat to human, simulation on cell level of the cardiac behavior can help to solve this problem. However, only supercomputers can meet the computational requirements of this simulation due to its great temporal and space requirements. New trend of having multiple accelerators such as Tianhe-2,which has three coprocessors(MICs) within one compute node, has dominated the development of Modern supercomputers. This trend enables efficient data communication among coprocessors becoming more important than how to optimize the performance of a single coprocessor. There exits no cardiac model simulation on such CPU with three coprocessors platform, regardless of research on efficient programming model and data communication methods. To solve these problems, we carry out the following researches:Firstly, this paper chooses a simple example of cardiac model simulation based on finite difference method, to explore and complete the parallel programming and optimization methods on general multi-core CPU platform. The chosen three dimensional model consists of two parts, partial differential equation(PDE) and ordinary differential equation(ODE), wherein, the PDE part contains a three-dimensional 7 point stencil computation, and the ODE part is simulated by a 4-variable F-Euler finite difference method.We paralleled and optimized the model through OpenMP, manual vectorization, access aligned and numerical algorithm optimization methods. The optimization on General CPU provides basis for further parallelization and optimization on coprocessor programming.Secondly, based on the simple finite difference model computation, we selected a real world scattering-reflectance model, where the scattering part is mainly on solving the PDE equations, and the reflection part on solution of the ODEs. We are the first to design and implement two different kinds of offload methods, offload using pragma directives and offload based on system level APIs(COI and SCIF). And we also parallel and optimize the two methods by task blocking, latency hiding and direct data communication methods, etc.. All these implementation are based on the Tianhe-2 supercomputer within one compute node with three coprocessors.Thirdly, within the context of offload programming model, this paper carries out a detailed comparison between two approaches, one using compiler directives and the other combining Intel’s COI and SCIF APIs for low-latency communication. While the first approach allows simpler programming, the latter has three advantages in(1) lower overhead associated with launching offloaded code,(2) higher data transfer bandwidths,and(3) more advanced asynchrony between computation and data movement for which the low-level COI-SCIF approach shows a considerable performance upper hand on a Tianhe-2 compute node, consisting of three Xeon Phi coprocessors.
Keywords/Search Tags:Intel Xeon Phi coprocessor, offload model, SCIF, Tianhe-2
PDF Full Text Request
Related items