Font Size: a A A

Research On Performance Prediction Of I/O Intensive Parallel Application

Posted on:2015-05-23Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhangFull Text:PDF
GTID:2348330422990886Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
It can be observed that CPU speed is increasing much faster than that of disk I/Ospeed, and the gap between them is expanding continously. I/O activity can no longer beneglected under many circumenstances. Given this background, it became veryimportant to predict the performance of I/O operations accurately. Such a prediction isvery helpful in finding application bottleneck and evaluating application’s performance.Many parallel task scheduling algorithms rely on performance prediction ofparallel task. Performance prediction is a foundamental technique for task scheduling, itis to predict cost of parallel tasks in different environment. This paper constructs askeleton program which simulates the original program in every way to predict the costof original program. We can run the skeleton program to predict the cost of originalprogram.This paper utilized the PMPI interface to generate wrapper functions which captureruntime information of original program. Those wrapper function are compiled in ashared library, they can obtain information without recompiling the original program.Additionally, some technique to formalize traces is brought forward in this paper to helpthe compressing and merging of traces later.Because traces generated by the same program (but different processes) are similarto each other. We tested three different merging algorithms to effectively strip redundantdata in traces and compared their result. We find out that the basic merging algorithm isthe fastest of all, but not generic. The merging algorithm based on suffix tree has thebest therotically worst time bound, but our experiment implies its performance is not asgood as other algorithms. The generic algorithm based on LCS has near linear timecomplexity, and performs quite well in practice.The trace of parallel programs is equvilant of recording everything after unwindingall loops in original program, making the size of traces very huge. All of this kind ofloop structure can be found and compressed by our loop compressing algorithm.Founding loops in trace can be generalized as the problem of finding all primitiverandem arrays in a string. The loop compressing algorithm is based on the data-structureof suffix tree and have the complexity of O(nlogn) even in worst case. We found out theperformance of our algorithm is better than all existing loop compressing algorithm, theexperiment shows our algorithm runs5~10times quicker than the next best algorithm.We can simulate the behavior of original program based on the compressed traces,the skeleton program can be constructed automatically and predict the performance ofthe corresponding original program. We carried out various experiments using IORparallel I/O benchmark, NPB-BTIO benchmark and MPI-TILE-IO benchmark, the result shows that skeleton programs constructed by this paper can simulate both thecomputing behavior and I/O behavior of original program accurately.
Keywords/Search Tags:distributed computing, MPI, primitive tandem array, suffix tree, performance skeleton
PDF Full Text Request
Related items