An Auto Performance Predict Research For Scientific Program Based On LLVM

Posted on:2016-07-21

Degree:Master

Type:Thesis

Country:China

Candidate:H C Xie

Full Text:PDF

GTID:2308330479491068

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Now a days computer hardware have development hugely, and now large-scale computing and parallel computing has received unprecedented attention. The related performance analysis and evaluation skills are also hot as well. Analysis of scientific computing and parallel programs(for short, scientific programs) is obviously different from those previous single-process programs. Scientific programs usually have the following characteristics, such as computationally intensive, highly parallelism, usually not require 3rd party libraries. Scientific programs are often SPMD(Single Program Multiple Data), and use MPI protocol.The performance model is used to describe characteristics of programs. The most direct way is predicting programs’ run time, mostly a set of formulas. Though we use a program named Dwarf Code(for short, DC) as performance model to predict programs’ run time. DC is generated based on the original program with LLVM and compiler technology. In the IR level, we analysis loop trip count in a program and combine LLVM’s static branch probability, generate and instrumentate IR code which caculating each basic block’s count, then promote view port to adjust the instrumentation’s position. Also find out communication statements and generate the code which caclulating message size and perform instrumentations. After that, analyze data dependency to reduce original code, which makes DC running faster than the original program. Implement the prediction. Due to DC generated from original program. It’s input format keeps the same. Run DC and output profiling file, which contains predicted basic blocks count result. And combine machine charastictics to calculate the predicted run time results of the original program. As well as more detailed prediction of the execution count of each basic block and function, the time spent in communication, etc.Our biggest contribution is to propose the concept of view port. Unify the two extreme static analyse method and dynamic edge profiling method. Propose that static means predict, dynamic means accuracy. Our biggest innovation is abandon the traditional thinking that compiler always makes equally transformation. Using the destructive reduce method. On the demand of tring best to protect program charactertistics, drop the correct output result. To make promise of effectiveness of reduce process, thus make DC run faster. We also propose the concept of prediction price and prediction cost-effective. Only when prediction cost-effective is greater than 1, the prediction itself isn’t unnecessar.We point out that the performance model is determined by program charastictics and machine characteristic, so seperate them makes DC’s output migratable, indenpent with target machine. DC is easy to use. No need any knowledge of related fields. No need any parameter setting. No need any configuration file. No need to know the source code. No need any training in advance. Not only disk space overhead is small, but time overhead and memory overhead is also small, relatively to original program.In CGPOP and NPB experiment, after analysis of the results of DC, pointed out static branch probability’s bad use. Finally, we would release all code and experiment data under GNU General Public License Version 3, any one can replay experiment, check error, or as well as fork a branch.

Keywords/Search Tags:

Parallel Analysis, Performance Model, LLVM, Compile Optimize, Data Dependency Analysis, Loop Trip Count

PDF Full Text Request

Related items

1	Research On Parallel Compilation Technology Of Sunway Processor Based On LLVM
2	Research On Loop Vectorization In LLVM
3	Design And Implementation Of An Activity-constrained Dependency-aware Trip Planning System
4	Analysis Of Station Level Data And Trip Data In Bicycle Sharing System
5	Research On Profile-Guided SIMD Vectorization Identification And Optimization
6	A Compile-time Optimization Method For Heterogeneous Computing Platforms Based On LLVM
7	Research On Automatic Generation Of Analytical Performance Model For Parallel Program
8	An Auto Performance Profiling For Parellel Programs On LLVM
9	On Parallel System Performance Analysis Based On TCPN
10	Design And Research On A Parallel Performance Data Collection,Representation And Analysis Framwork For The SMP-Cluster Architecture