Font Size: a A A

Translation of OpenMP to dataflow execution model for data locality and efficient parallel execution

Posted on:2004-06-13Degree:Ph.DType:Dissertation
University:University of HoustonCandidate:Weng, Tien-hsiungFull Text:PDF
GTID:1466390011461759Subject:Computer Science
Abstract/Summary:
OpenMP has become widely accepted for shared-memory programming. It is not only easier for a non-expert programmer to develop a parallel application under OpenMP than in the de facto message-passing standard MPI, but it is also possible to incrementally develop portable OpenMP applications. However, it is up to the user to ensure that performance does not suffer as a result of poor cache locality or high synchronization overheads: poor data locality and barrier synchronizations have a particularly large impact on the performance of applications on cache coherent Non-Uniform Memory Access (ccNUMA) machines. Moreover, most OpenMP compilers perform little or no optimization.; In this dissertation, we propose the translation of OpenMP to a data flow execution model realized by using the SMARTS runtime system instead of translating it directly to a multi-threaded program as most compilers do. Our goal is to perform analyses that will enable us to improve the data locality as well as reduce synchronization overheads. The compiler transforms an OpenMP code to a collection of sequential tasks and a task graph that indicates their execution constraints. It then specifies the mapping of tasks to processors (or equivalently, to threads that are bound to processors). Eliminating constraints in the task graph, and finding a good compile-time mapping of nodes of the task graph to the processors for data locality reuse, are the main components of the transformation. Our experimental results show that our strategy can outperform straightforward OpenMP code, and that the resulting code also scales well.
Keywords/Search Tags:Openmp, Data locality, Execution
Related items