Fast Sparse Deep Neural Network Inference On GPU

Posted on:2023-09-14

Degree:Master

Type:Thesis

Country:China

Candidate:J Xin

Full Text:PDF

GTID:2558307043974949

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

With the rapid development of artificial intelligence,the parameter size of deep neural network models is increasing.Researchers use techniques such as pruning to transform weight matrix into sparse deep matrix,thus reducing the storage cost and computational overhead of the models.On the other hand,with the increasing GPU computing power and the innovation of high performance computation techniques,fast GPU-based inference system of deep neural networks has matured.The core operation of sparse deep neural networks is Sparse Matrix-dense Matrix Multiplication(Sp MM).The performance of Sp MM is closely related to data features such as the distribution of nonzero elements of sparse matrices.Thus,different optimization methods achieve different optimization effects on different data sets,and no single optimization method can achieve optimal performance on all data sets.To address the above problem,a GPU-based Fast Sparse DNN Inference System(FSDI)is proposed,which takes data features such as non-zero element distribution of sparse matrices as input.FSDI build an Sp MM optimization space model and use data features to search a good method.Specifically,firstly,the Sp MM optimization method is abstracted into search space based on four loop transformations include loop tiling,loop parallel,loop schedule and loop compact.Secondly,the performance evaluation model is proposed by consider the load balancing and the memory access cost,followed by the feature of the sparse matrix to obtain a suitable Sp MM optimization method.Finally,according to the characteristics of the GPU architecture,the search process is accelerated by pruning the optimization space.In addition,for the problem of multiple operators with a large amount of intermediate data,a sparsity-aware operator fusion mechanism is proposed.FSDI can fuse multiple Sp MM operations with few computational dependencies through sparsity analysis and store the intermediate results on the shared memory.By reduces the global memory access overhead,the performance is improved.To further increase the number of fusible operators,a locality-aware data hashing method with high fusion is used to rearrange the non-zero elements,and increasing the locality of the sparse matrix.Performance tests are doing on networks consisting of fully connected layers with 1024 to 65536 neurons,and the results showed a performance improvement of 1.73 times to 13.74 times compared to the H&P system on a single V100 GPU.

Keywords/Search Tags:

Parallel Computing, Graphics Processing Unit, Sparse Deep Neural Network, Sparse Matrix-dense Matrix Multiplication

PDF Full Text Request

Related items

1	Parallel Algorithms And Architectures For Matrix Computations On FPGA
2	Sparse Matrix Vector Multiplication Based On CPU And GPU
3	Research On Sparse Matrix Multiplication Acceleration Technology For Sunway Architecture
4	Research And Application Of Parallel Sparse Diagonal Matrix-vector Multiplication Algorithm On GPU
5	Performance Optimization Of Sparse Matrix Multiplication For High-Performance Computing Platforms
6	Optimization And Realization For Sparse Matrix-Vector Multiplication On FPGA
7	The Research Of Hybrid Schedulling Model Based On Sparse Matrix And Parallel Algorithm
8	Optimizing Sparse Matrix-vector Multi Based On OpenCL
9	Research On SVM Accelerated Computation Based On Sparse Matrix Multiplication
10	Research On Parallel Method For Image Denoising Via Sparse Representations