Font Size: a A A

The Performance Evaluation Research Of CFD Application On Intel MIC

Posted on:2014-09-15Degree:MasterType:Thesis
Country:ChinaCandidate:Y Q LiFull Text:PDF
GTID:2308330479479428Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Computational Fluid Dynamics(CFD) is widely used in the design of aerospace crafts and groud viecles. It is one of the typical application fields of High Performance Computing(HPC). To improve the performance of CFD applications, its necessary to measure and analyze the CFD applications ’ performacne on the HPC platforms in order to guide the performance optimizations of the CFD applications. Aiming at fully understanding the program’s performance, this thesis studies the performance evaluation and modeling of CFD applications based on a new high-end platform-- Intel Many Integrated Cores(MIC) architecture. The main works and contributions are as follows:(1) We evaluate the single-card performance of NAS Parallel Benchmark- Multi Zone version(NPB-MZ) application, which are typical benchmark codes derived from real world CFD applications, on the MIC coprocessor. The parallel performance and scalability of the applications in the OpenMP and MPI/OpenMP hybrid programming models are thoughroughly investigated. The performance data measured on the MIC coprocessor are compared against that are obtained on computer nodes based on Sandy Bridge processor. The results show that these programs can achieve good parallel scalability when running with appropriate combinations of processes and threads. But their absolute performance on Intel Xeon Phi coprocessor is significantly lower than that on the Sandy Bridge node, due primarily to the much lower single thread performance. The findings of this paper are of help to the performance optimization of other applications on MIC.(2) We analyzed the performance issue of the single thread performance of CFD applications on the MIC coprocessor and find that the under-ultilization of 512-bit Vecror Processor Unit is the main cause of the low performance. The NPB-MZ applications’ Vectorization Intensity on the MIC coprocessor is measured based on Likwid, a hardware counter based profiling tool. And the phenomenens that key code constructs cannot be vectorized are analyzed with the aid of the compiler ’ s vectorization report utility. And reasons why certain code constructs cannot be vectorized are also discovered. The results further reveal the CFD applications ’performance characteristcs on the MIC architecture. We provide sugestions for CFD application optimization based on the results. Which is helpful to the performance optimization CFD applications on the MIC architecture.(3) We evaluate the efficacy of the hardware prefetching of Intel MIC via the Stanza Triad microbenchmark. The analytical model based on prefetch technology accurately captures the memory overhead for a given data access pattern. We conclude that non-contiguous access to memory is detrimental to memory bandwidth efficiencyand thus the performance of memorybound kernels. For these reasons, programmers should try to create the longest possible stanzas of contiguous memory accesses for better prefetching effects and larger memory bandwidths. Based on the above idea, we study the different blocking strategies of three dimensions and the results show that the blocking of the continuous access dimension and dividing-thread dimension bring poor performance while the best strategy is to block the middle dimension of the loop.(4) We establish the roofline model of MIC which ties together the floating-point performance and memory performance of MIC and the operational intensity of stencil computing together in a two dimensional graph. The graphical model shows that the minimum operational intensity required for the best performance is 3.05 on MIC.Besides, the optimization methods and the upper bound of performance are combined together, which offers insights into the performance of stencil computing.
Keywords/Search Tags:Intel MIC, CFD, NPB-MZ, Vectorization, Stencil Computing, Performance evaluation, Performance modeling
PDF Full Text Request
Related items