Font Size: a A A

A High-performance Sparse Convolutional Neural Network Based On GPU

Posted on:2019-04-18Degree:MasterType:Thesis
Country:ChinaCandidate:C FangFull Text:PDF
GTID:2428330611993253Subject:Microelectronics and Solid State Electronics
Abstract/Summary:PDF Full Text Request
In recent years,with the continuous development of neural networks,Convolutional Neural Network(CNN),as an important branch of neural networks,has been widely used in many fields of artificial intelligence(AI),such as computers image,speech recognition,robotics,etc.CNN can achieve incredible accuracy on many complex AI issues,but CNN's model size is getting larger and larger,and the number of parameters included in the network is rapidly increasing.Then,in the process of processing CNN,how to improve the resource consumption and throughput of the entire network without increasing hardware overhead or loss accuracy has become an important branch in the field of CNN research.As a representative solution in the scale of compressed CNN,Weight Pruning achieves the purpose of compressing the size of CNN parameters by deleting some parameters in the network that have less influence on the accuracy of the results.A compressed CNN model after weight pruning produces a large number of sparse data structures.This type of CNN is generally called a Sparse Convolutional Neural Network(SCNN).However,in the actual implementation process,as the status of the graphics processing unit(GPU)continues to increase,the GPU cannot handle the sparse data structure of the SCNN due to its own architectural features.Although the weight pruning can help SCNN reduce the multiply-accumulate(MAC)operation by more than 70% compared with CNN,the SCNN performance implemented under the GPU architecture often fails to achieve the expected improvement.Aiming at the performance loss of SCNN on GPU platform,this paper mainly studies how to optimize the SCNN for the architecture of GPU,so as to greatly shorten the execution time of SCNN without the loss of accuracy and the increase of hardware resource overhead.The main research results and innovations of this paper are as follows:1)A new combination method of convolution calculation is adopted.Select thecorresponding convolution algorithm according to the sparsity of convolutionlayer.Compared with the traditional single convolution algorithm,this method cangreatly utilize the sparsity to affect the performance of GPU calculation.Sparseconvolution layers and dense convolution layers respectively adopt Direct SparseConvolution algorithm and Lowering algorithm.2)Direct Sparse Convolution algorithm implemented on GPU platform.Accordingto the architecture characteristics of GPU,parallel optimization strategy basedon TILED and BLOCKED is implemented.The optimization strategy makes fulluse of the sparsity of the data and the network structure to allocate threads for taskscheduling,and utilizes the locality of the data to manage memory replacement.Among them TILED and BLOCKED have achieved significant accelerationeffects.Through TILED optimization,compared with cuBLAS,we can achieve the acceleration ratio of 1.07 *-1.23 *,1.17 *-3.51 *,1.32 *-5.00 * on AlexNet,GoogleNet,ResNet.And compared with cuSPARSE,we can achieve the speedup of 1.31 *-1.42 *,1.09 *-2.00 *,and 1.07 *-3.22 * on Alex Net,GoogleNet,and ResNet,respectively.In addition,BLOCKED showed a better optimization effect on the basis of TILED,which improved the performance of 38.1%,27.8%,18.6% and 18.1% respectively at each layer of AlexNet.
Keywords/Search Tags:Convolutional Neural Network, Graphics Processing Unit, Sparsity, Parallelism, Optimization
PDF Full Text Request
Related items