Font Size: a A A

Performance portability of parallel kernels on shared-memory systems

Posted on:2014-11-07Degree:Ph.DType:Dissertation
University:University of Illinois at Urbana-ChampaignCandidate:Stratton, John AndrewFull Text:PDF
GTID:1458390008454438Subject:Engineering
Abstract/Summary:
This work describes my solution to the performance portability problem: between CPUs and GPUs in particular, but laying the foundation for even broader performance portability support. I argue that the best approach is to use a language like OpenCL as a portable, low-level programming model with well-defined mechanisms for expressing multi-level parallelism and locality. That low-level program representation can be supported with architecture-specific compilers, runtimes, and libraries to target the application code to various platforms with high performance. High-level language designers or tool developers could then target this single, low-level programming and parallelism model as a portable, high-performance intermediate program representation.;To demonstrate the feasibility of this approach, I show how one would design a good CPU implementation of OpenCL given that the programs are written according to the current high-level GPU vendor optimization guidelines. Programs written in such a way already meet the criteria of good GPU performance, and in this work, I show that those same programs on a CPU platform implemented according to my proposals can out-perform an OpenMP implementation of the same algorithm on the same system.
Keywords/Search Tags:Performance portability
Related items