Compile-time and run-time optimizations for enhancing locality and parallelism on multi-core and many-core systems

Posted on:2010-08-22

Degree:Ph.D

Type:Dissertation

University:The Ohio State University

Candidate:Baskaran, Muthu Manikandan

Full Text:PDF

GTID:1448390002988508

Subject:Computer Science

Abstract/Summary:

Current trends in computer architecture exemplify the emergence of multiple processor cores on a chip. The modern multiple-core computer architectures that include general-purpose multi-core architectures (from Intel, AMD, IBM, and Sun), and specialized parallel architectures such as the Cell Broadband Engine and Graphics Processing Units (GPUs) have very high computation power per chip. A significant challenge to be addressed in these systems is the effective load-balanced utilization of the processor cores. Memory subsystem has always been a performance bottleneck in computer systems and it is more so, with the emergence of processor subsystem with multiple on-chip processor cores. Effectively managing the on-chip and off-chip memories and enhancing data reuse to maximize memory performance is another significant challenge in modern multiple-core architectures.;Our work addresses these challenges in multi-core and many-core systems, through various compile-time and run-time optimization techniques. We provide effective automatic compiler support for managing on-chip and off-chip memory accesses, with the compiler making effective decisions on what elements to move in and move out of on-chip memory, when and how to move them, and how to efficiently access the elements brought into on-chip memory. We develop an effective tiling approach for mapping computation in regular programs on to many-core systems like GPUs. We develop an automatic approach for compiler-assisted dynamic scheduling of computation to enhance load balancing for parallel tiled execution on multi-core systems.;There are various issues that are specific to the target architecture which need attention to maximize application performance on the architecture. First, the levels of parallelism available and the appropriate granularity of parallelism needed for the target architecture have to be considered while mapping the computation. Second, the memory access model may be inherent to the architecture and optimizations have to be developed for the specific memory access model. We develop compile-time transformation approaches to address performance factors related to parallelism and data locality that are GPU architecture-specific, and develop an end-to-end compiler framework for GPUs.

Keywords/Search Tags:

Parallelism, Architecture, Systems, Processor cores, Multi-core, Many-core, Compile-time, Performance

Related items

1	Research On Memory-level Parallelism For Multi-core Microprocessor Chip
2	The Research On Mechanisms Of Optimizing Memory Access In Multi/Many-Core Architecture
3	Research And Design Of Multi-core Processor System Based On FPGA
4	Research On Key Technology Of Multi - Core Processor
5	Architecting a Workload-agnostic Heterogeneous Multi-core Processor
6	Research On Task-&Data-level HEVC Parallel Decoding Technique And Application Based On Multi-core Processor
7	Thread Scheduling Based On Multi-core Systems
8	Extension Of Real-time Operating System Based On Multi-core Processors
9	Research On The Key Techniques Of Parallelization And Optimization For Multi-Core Architecture
10	Performance Analysis And Optimization Of Genetic Algorithms On Multi-core Systems