Shared memory optimizations for distributed memory programming models

Posted on:2014-01-04

Degree:Ph.D

Type:Thesis

University:Indiana University

Candidate:Friedley, Andrew

Full Text:PDF

GTID:2458390008951278

Subject:Computer Science

Abstract/Summary:

In the world of parallel programming, there are two major classes of programming models: shared memory and distributed memory. Shared memory models share all memory by default, and are most effective on multi-processor systems. Distributed memory models separate memory into distinct regions for each execution context and are most effective on a network of processors. Modern and future High Performance Computing (HPC) systems will contain multi- and many-core processors connected by a network, resulting in a hybrid shared and distributed memory environment. Neither programming model is ideal in both areas. Now and in the future, optimizing parallel performance for both memory models simultaneously is a major challenge. MPI (Message Passing Interface) is the de-facto standard for distributed memory programming, but results in less than ideal performance when used in a shared memory environment. Message passing incurs overhead in the form of unnecessary data copying as well specific queuing, ordering, and matching rules. In this thesis, we will present a series of techniques that optimize MPI performance in a shared memory environment, thus helping to solve the challenge of optimizing parallel performance for both distributed and shared memory. We introduce the concept of a shared memory heap, in which dynamically allocated memory is shared by default on all MPI processes within a node. We then use that to transparently optimize message passing with two new data transfer protocols. Next, we propose an MPI extension for ownership passing, which eliminates data copying overheads completely. Instead of copying data, we transfer control (ownership) of communication buffers. Finally, we explore how shared memory techniques can be applied in the context of MPI and the shared memory heap. Loop fusion is a new technique for combining the packing and unpacking code on two different MPI ranks to eliminate explicit communication. All of these techniques are implemented in a freely available software library named Hybrid MPI (HMPI). We experimentally evaluate our work using a variety of micro-benchmarks and mini-applications. In the mini-applications, communication performance is improved up to 46% by our data transfer protocols, 54% by ownership passing, and 63% by loop fusion.

Keywords/Search Tags:

Shared memory, Programming, Models, MPI, Performance, Passing, Data

Related items

1	Study On Parallel Programming Models
2	Design And Implementation Of DGA A Parallel Programming Model That Support Out-of-core Computing
3	Study On Parallelized Geographic Computing Technology And Performance Evaluation Models
4	A performance comparison: MPICH, message passing interface against Treadmarks, distributed shared memory
5	A Parallel Programming Model For Shared Persistent Memory On Node.js
6	Comparison of programming models on distributed memory multiprocessors
7	Research On Automatic Parallelization And Optimization Technologies For Shared Memory Architecture
8	Speculative distributed shared-memory multiprocessors organized as processor-and-memory hierarchies
9	Shared memory programming support for next generation virtualized high performance computing systems
10	Optimizing shared memory programs for distributed memory architectures