Research On Performance Prediction Of I/O Intensive Parallel Application

Posted on:2015-05-23

Degree:Master

Type:Thesis

Country:China

Candidate:Y Zhang

Full Text:PDF

GTID:2348330422990886

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

It can be observed that CPU speed is increasing much faster than that of disk I/Ospeed, and the gap between them is expanding continously. I/O activity can no longer beneglected under many circumenstances. Given this background, it became veryimportant to predict the performance of I/O operations accurately. Such a prediction isvery helpful in finding application bottleneck and evaluating application’s performance.Many parallel task scheduling algorithms rely on performance prediction ofparallel task. Performance prediction is a foundamental technique for task scheduling, itis to predict cost of parallel tasks in different environment. This paper constructs askeleton program which simulates the original program in every way to predict the costof original program. We can run the skeleton program to predict the cost of originalprogram.This paper utilized the PMPI interface to generate wrapper functions which captureruntime information of original program. Those wrapper function are compiled in ashared library, they can obtain information without recompiling the original program.Additionally, some technique to formalize traces is brought forward in this paper to helpthe compressing and merging of traces later.Because traces generated by the same program (but different processes) are similarto each other. We tested three different merging algorithms to effectively strip redundantdata in traces and compared their result. We find out that the basic merging algorithm isthe fastest of all, but not generic. The merging algorithm based on suffix tree has thebest therotically worst time bound, but our experiment implies its performance is not asgood as other algorithms. The generic algorithm based on LCS has near linear timecomplexity, and performs quite well in practice.The trace of parallel programs is equvilant of recording everything after unwindingall loops in original program, making the size of traces very huge. All of this kind ofloop structure can be found and compressed by our loop compressing algorithm.Founding loops in trace can be generalized as the problem of finding all primitiverandem arrays in a string. The loop compressing algorithm is based on the data-structureof suffix tree and have the complexity of O(nlogn) even in worst case. We found out theperformance of our algorithm is better than all existing loop compressing algorithm, theexperiment shows our algorithm runs5~10times quicker than the next best algorithm.We can simulate the behavior of original program based on the compressed traces,the skeleton program can be constructed automatically and predict the performance ofthe corresponding original program. We carried out various experiments using IORparallel I/O benchmark, NPB-BTIO benchmark and MPI-TILE-IO benchmark, the result shows that skeleton programs constructed by this paper can simulate both thecomputing behavior and I/O behavior of original program accurately.

Keywords/Search Tags:

distributed computing, MPI, primitive tandem array, suffix tree, performance skeleton

PDF Full Text Request

Related items

1	Research On Performance Skeleton Based Performance Evaluation Of Parallel Tasks
2	Location And Analysis Of Repetitive Sequences In Genomes
3	Finding MUMs With Enhanced Suffix Arrays
4	Research On Construction Of Index Structure For Biological Sequences
5	Multi-pattern Matching With Wildcards Based On Suffix Tree And Suffix Array
6	Research Of Finding Maximal Unique Matches In Genome
7	Research And Implementation Of Distributed And Parallel Algorithm For Large-Scale Generalized Suffix Tree Construction
8	Engineering Implementation And Performance Evaluation Of An External Memory Suffix Array Construction
9	Research On P2P Searching Technology Based On Distributed Suffix Tree
10	Performance Optimization Of A New External Memory Algorithm For Suffix Array Construction