Font Size: a A A

Compile-time and run-time optimizations for enhancing locality and parallelism on multi-core and many-core systems

Posted on:2010-08-22Degree:Ph.DType:Dissertation
University:The Ohio State UniversityCandidate:Baskaran, Muthu ManikandanFull Text:PDF
GTID:1448390002988508Subject:Computer Science
Abstract/Summary:
Current trends in computer architecture exemplify the emergence of multiple processor cores on a chip. The modern multiple-core computer architectures that include general-purpose multi-core architectures (from Intel, AMD, IBM, and Sun), and specialized parallel architectures such as the Cell Broadband Engine and Graphics Processing Units (GPUs) have very high computation power per chip. A significant challenge to be addressed in these systems is the effective load-balanced utilization of the processor cores. Memory subsystem has always been a performance bottleneck in computer systems and it is more so, with the emergence of processor subsystem with multiple on-chip processor cores. Effectively managing the on-chip and off-chip memories and enhancing data reuse to maximize memory performance is another significant challenge in modern multiple-core architectures.;Our work addresses these challenges in multi-core and many-core systems, through various compile-time and run-time optimization techniques. We provide effective automatic compiler support for managing on-chip and off-chip memory accesses, with the compiler making effective decisions on what elements to move in and move out of on-chip memory, when and how to move them, and how to efficiently access the elements brought into on-chip memory. We develop an effective tiling approach for mapping computation in regular programs on to many-core systems like GPUs. We develop an automatic approach for compiler-assisted dynamic scheduling of computation to enhance load balancing for parallel tiled execution on multi-core systems.;There are various issues that are specific to the target architecture which need attention to maximize application performance on the architecture. First, the levels of parallelism available and the appropriate granularity of parallelism needed for the target architecture have to be considered while mapping the computation. Second, the memory access model may be inherent to the architecture and optimizations have to be developed for the specific memory access model. We develop compile-time transformation approaches to address performance factors related to parallelism and data locality that are GPU architecture-specific, and develop an end-to-end compiler framework for GPUs.
Keywords/Search Tags:Parallelism, Architecture, Systems, Processor cores, Multi-core, Many-core, Compile-time, Performance
Related items