Font Size: a A A

Research On Optimization Methods Based On Decoupled Access/Execute Architecture

Posted on:2015-11-28Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z ZhaoFull Text:PDF
GTID:1318330518978671Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Decoupled Access/Execute(DAE)Architecture can improve the memory ac-cess performance of processors.The DAE architecture has a outstanding con-currency of computing and memory access through the decomposition of them.However,there are some defects for developing and optimizing software based on these processors,such as difficult coding,short of basic library and program execution model for the performance analysis.This paper explores the coding and optimization theories under the DAE architecture in order to improve the efficiency of the applications based on these platforms.This paper introduces the theories and methods of program optimization based on the features of the DAE architecture.Moreover,a clustering program-ming model is designed in this paper.It helps the programmers to reconstruct algorithms based on the DAE architecture.That makes the algorithms closely combine with the features of DAE architecture.These works can be divided into 4 parts,program execution model,basic optimization means,model driven op-timization theory and clustering programming model.Specific contents are as follows:Since there are few program execution models for the performance analysis on the DAE architecture processors,this paper proposes a model making full use of the features of multi-layer storage,and introducing a parameter to measure the overhead of launching instructions.Besides,this execution model introduces a variable about the ratio of computation to memory operations.It aims to establish the relationship between this ratio and the performance of program.This program execution model can be used to guide the program optimization theoretically in many aspects,e.g.,bandwidth utilization and multi-channel memory access.To deal with the low coupling of BLAS to DAE architecture,several solutions to optimize level-2 library of BLAS(BLAS2)based on DAE architecture are proposed.Based on Godson-3B processor,this paper introduces several methods which make full use of the DAE architecture processor's hardware features to improve the performance of memory access.Besides,GEMV,the kernel function of BLAS2,is selected to demonstrate the optimization.Experiments demonstrate that the best performance of the optimized GEMVs on Godson-3B exceeds all the other BLAS libraries based on Godson-3B.This paper introduces a model guided optimization method for DAE archi-tecture processor to overcome the low efficiency of optimization on DAE archi-tecture.Specifically,first,a multi-layer general dense matrix multiplication oper-ation(GEMM)algorithm is introduced.The layers of GEMM is lowly coupled.Thereby,the local optimal solution can be achieved through optimizing layer by layer.Second,based on the feature that DAE architecture memory access is de-coupled from execution unit,an instrumentation is utilized to add collector module of memory access status to the control code of access processor.Besides,a fetch performance evaluation system-DAEFS is introduced.Based on the fact that two levels of the 4-GEMM is self-adjust,this DAEFS system represents great help for collecting the information about the relationship between fetching and calcu-lation for DAE processors.With the help of DAEFS,the relationship between the access and calcution is changed,and thereby the performance of program is improved.This paper proposes a clustering programming model based on the DAE architecture to upgrade the concurrency of programs.This model divides the execution flows of programs into several modules.Then the Data Flow Dia-gram(DFD)is employed to describe the dependencies of these modules.Based on the DFD,a heuristic directed graph-based clustering algorithm is proposed to classify the modules.This algorithm is platform independent,easy to implement,and less dependent on specific hardware.In this paper,the GPGPU on Kepler architecture is chosen as experiment platform,in which the data decoupling clas-sification is applied to the deep neural network(DNN)algorithm.Experimental result shows that the parallel computation of reconstructed DNN gets acceler-ated greatly.Thereby the efficiency of algorithm is improved on the GPGPU computing platform.
Keywords/Search Tags:Decoupled Access/Execute Architecture, GPGPU, Optimization theory, Godson-3B, Program Execution Model, Programming model, Basic math library, Depth Neural Network
PDF Full Text Request
Related items