Font Size: a A A

Research For The Implementation And Optimization Technology Of Typical Image Processing Algorithms On Xeon Phi

Posted on:2014-09-01Degree:MasterType:Thesis
Country:ChinaCandidate:J QiFull Text:PDF
GTID:2308330479479109Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rising of heterogeneous system, the High Performance Computation(HPC) domain develops greatly. Heterogeneous system based on GPU+GPU is applied broadly in many fields, such as bioinformatics, medical imaging, and computational fluid mechanics(CFD) and so on. However, CPU and GPU use different instruction sets and programming models which call for higher requirement to program and optimize an application. Hence, in 2012 Intel proposes Xeon Phi coprocessor based on Many Integrated Cores(MIC) architecture,which relieves the difficulty of programming via inheriting the traditional programming models and characteristics of x86. In addition to this, Xeon Phi integrates over 50 lightweight x86 cores. Each core supports 4 hardware threads and contains a SIMD 512-bit wide Vector Processing Unit(VPU). Therefore,Xeon Phi provides a powerful parallel processing ability. However, the research of optimizing algorithms on Xeon Phi is in its fancy at present.In this thesis, we study on how to implement and accelerate two typical image processing algorithms on Xeon Phi platform. The image processing algorithms demand a high performance due to huge amount of data and high real-time requirement. Subsequently, we select two representative algorithms which are 2D IDCT algorithm and 3D GVF field algorithm respectively as our case study on Xeon Phi.Our main contributions are as follows:(1)Porting 2D IDCT algorithm to Xeon Phi and optimizing the algorithm on it. Firstly, we implement the serial version of the algorithm in principle of a row-column separation method. We regard the performance of the serial version as a reference for the implementation with the following optimizations. Then, we extend and vectorize the serial implementation with multithread standard OpenMP and 512-bit SIMD intrinsic provided by Intel respectively. At last, we further optimize the previous implementation(thread extension and vectorization) with data pre-fetching. The test shows that, the vectorization wins a performance of 5.82 X speedup for the processing of single precision image compared with the implementation without vectorization, and the performance of the algorithm increases nearly with a linear speed as the thread extends; besides, the data pre-fetching boosts the algorithm by about 24% performance. Combining all of these optimizations, the best performance for the algorithm on Xeon Phi is about 1.53 times to the performance achieved on one E5-2670 CPU.(2)Porting 3D GVF field algorithm to Xeon Phi and optimizing the 3D GVF field algorithm on the platform. In addition to discussing the general optimizations such as vectorizing and thread-extending, we focus on the impact of optimizations for stencil computation on the algorithm’s performance. We design an efficient loop tiling strategy, which improves the cache utilization, to reduce the performance loss. The test shows that, the 3D GVF field computation for double precision image obviously obtains performance growth; through the loop tiling strategy proposed in this thesis, the algorithm achieves the best performance on Xeon Phi with a speedup of 1.78 ? and 2.77 ? for the problem scale of ??256256256 and ??512512512 respectively compared with the best performance achieved on one E5-2670 CPU(3)Summarizing the optimization law for image processing algorithms on Xeon Phi and drawing the techniques which provide guidance and benefit for the optimizations of other image processing algorithms. In general, for the intensive computation algorithms, a good performance can be obtained via the basic optimizing techniques directly such as vectorizing and thread-extending; whereas, for the algorithms wit a low computation-access ratio, increasing cache utilization should be accounted first, the loop tiling method proposed in this thesis can do this work well.
Keywords/Search Tags:Xeon Phi, IDCT, 3D GVF field vector, vectorizing, thread-extending, data pre-fetching, loop tiling
PDF Full Text Request
Related items