Key Technology Research On Implementation And Optimization Of OpenACC On MIC

Posted on:2014-05-07

Degree:Master

Type:Thesis

Country:China

Candidate:C Chen

Full Text:PDF

GTID:2308330479979448

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Heterogeneous system with relatively high GFLOPs and low relative power footprints has become current research focus. With GPU’s success in general-purpose computing, Intel has unveiled its Many Integrated Core(MIC) coprocessor based on Intel Architecture(IA). Xeon Phi, the second generation of MIC product, has been successfully deployed in the Milk WayⅡ. However, the obstacle of programming and optimization limit the promotion and application of heterogeneous systems. In recent years, the pragma-based programmer interface like OpenACC standard is frequently studied. Up to now, there is no implementation of OpenACC on MIC.In this paper, we take advantage of OpenACC to program on the newly Intel MIC coprocessor by automatically translating OpenACC source code to Intel Offload(Offload for short) source code. Then we optimize the translated code according to the OpenACC specification and MIC architecture. Our main contributions are as follows:1. Proposed a translation model of mapping OpenACC to Offload. We analyze the essential factors of directive-based heterogeneous programming. That’s task management, parallel describe and data management. Then we proposed the source-to-source translation methods according to the relationship of OpenACC and Offload mainly on these factors. As the divergence of OpenACC specification and MIC, we carefully study the efficiency of the translation policy.2. Proposed the optimization methods according to the efficiency. That’s task repartition, vectorization optimization based on 512-bit SIMD on MIC, and also we designed a tree barrier synchronization algorithm based on SIMD intrinsic.3. Achieved the automatically translation compiler framework of OpenACC source code to Offload source code. Adopting layered structure design, the front-end module translates the OpenACC source code to Internal Representation(IR), after middle-end module optimization, the back-end module outputs the Offload source code.4. Two kernels, the matrix multiplication and JACOBI, are studied on the MIC-based platform(one Xeon Phi card) and the GPU-based platform(one NVIDIA Tesla k20 c card) to analyze the mapping and translation efficiency. Performance evaluation shows that both kernels deliver a speedup of approximately 3 on one Xeon Phi card than on one Intel Xeon E5-2670 octal-core CPU. Moreover, the two kernels gain better performance on MIC-based platform than on the GPU-based one.

Keywords/Search Tags:

Heterogeneous System, GPU, OpenACC, MIC, Intel Offload, Barrier Synchronization

PDF Full Text Request

Related items

1	A Compiler For Automatic Translating OpenACC Program To Intel Multicore And Manycore Platform
2	Research On Traffic Offload Strategy In Wireless Heterogeneous Networks
3	Research On Key Technologies For Realizing And Accelerating LARED-P On Intel Xeon Phi
4	Study Of Porting And Optimization Of GTC-P On Large Scale System Using OpenACC
5	The Evaluation Of Portable Performance For OpenACC 2.0
6	Research On Barrier Coverage Problem Algorithms In Wireless Sensor Networks
7	Research On OpenACC-based Automatic Parallelization Technology
8	Design And Implementation Of Accelerating Compression Subsystem Based On OpenACC
9	Research On Synchronization Technology In Heterogeneous Broadcast/Multicast Radio Networks
10	Research On Heterogeneous System Oriented Parallel Programming