Font Size: a A A

Performance Optimization of Stencil Computations on Modern SIMD Architectures

Posted on:2015-09-26Degree:Ph.DType:Dissertation
University:The Ohio State UniversityCandidate:Henretty, Thomas SteelFull Text:PDF
GTID:1478390017993715Subject:Computer Science
Abstract/Summary:
Performance of scientific computing codes on modern high-performance computing.;(HPC) systems has, in some cases, not achieved a significant percentage of the.;system's peak performance. Three of the fundamental causes of this lack of.;efficiency are (1) less than optimal utilization of the short-vector SIMD units.;found in nearly all modern HPC systems, (2) less than optimal utilization of the.;memory hierarchy and (3) less than optimal utilization of all computing cores.;available in a system. Codes that are able to overcome one or more of these.;limitations are generally very complex and their implementation requires both an.;expert programmer and a substantial amount of time.;In this work, a class of scientific computing codes known stencil computations is.;examined and shown to exhibit a fundamental algorithmic limitation that.;interferes with the generation of optimal SIMD code. A data layout.;transformation (DLT) to overcome this limitation is described and comprehensive.;results for cache-resident problem sizes are presented. It is shown that this.;DLT can significantly increase the performance of stencil computations on modern.;SIMD architectures.;While substantial performance gains can be realized using the DLT for small.;problem sizes, larger problem sizes require the application of spatial and.;temporal loop tiling techniques to relieve pressure on the memory subsystem and.;exploit all available multicore parallelism. Two closely related tiling.;techniques, nested and hybrid split tiling, are developed and shown to exhibit.;high performance across a variety of modern multicore SIMD architectures and.;stencil benchmarks.;Combining SIMD, memory hierarchy, and parallelism optimizations for stencil.;computations leads to code that is very complex and difficult for scientists and.;even seasoned programmers to implement. Further, these optimizations are.;difficult to integrate into a general purpose compiler as there is no existing.;framework for reliably identifying and representing stencil computations in a.;general purpose language such as C. These problems are resolved with the.;creation of the Stencil Domain Specific Language (SDSL). This language uses data.;structures and concepts specific to stencil computations to enable the retention.;of fundamental information about the stencil throughout the compilation process.;Preserving the details of a stencil computation enables the automated generation.;of complex, highly optimized code for multiple parallel vector architectures.;from a simple specification in SDSL.
Keywords/Search Tags:Stencil, SIMD, Performance, Modern, Architectures, Less than optimal utilization, Code, Computing
Related items