A High-performance Sparse Convolutional Neural Network Based On GPU

Posted on:2019-04-18

Degree:Master

Type:Thesis

Country:China

Candidate:C Fang

Full Text:PDF

GTID:2428330611993253

Subject:Microelectronics and Solid State Electronics

Abstract/Summary:

PDF Full Text Request

In recent years,with the continuous development of neural networks,Convolutional Neural Network(CNN),as an important branch of neural networks,has been widely used in many fields of artificial intelligence(AI),such as computers image,speech recognition,robotics,etc.CNN can achieve incredible accuracy on many complex AI issues,but CNN's model size is getting larger and larger,and the number of parameters included in the network is rapidly increasing.Then,in the process of processing CNN,how to improve the resource consumption and throughput of the entire network without increasing hardware overhead or loss accuracy has become an important branch in the field of CNN research.As a representative solution in the scale of compressed CNN,Weight Pruning achieves the purpose of compressing the size of CNN parameters by deleting some parameters in the network that have less influence on the accuracy of the results.A compressed CNN model after weight pruning produces a large number of sparse data structures.This type of CNN is generally called a Sparse Convolutional Neural Network(SCNN).However,in the actual implementation process,as the status of the graphics processing unit(GPU)continues to increase,the GPU cannot handle the sparse data structure of the SCNN due to its own architectural features.Although the weight pruning can help SCNN reduce the multiply-accumulate(MAC)operation by more than 70% compared with CNN,the SCNN performance implemented under the GPU architecture often fails to achieve the expected improvement.Aiming at the performance loss of SCNN on GPU platform,this paper mainly studies how to optimize the SCNN for the architecture of GPU,so as to greatly shorten the execution time of SCNN without the loss of accuracy and the increase of hardware resource overhead.The main research results and innovations of this paper are as follows:1)A new combination method of convolution calculation is adopted.Select thecorresponding convolution algorithm according to the sparsity of convolutionlayer.Compared with the traditional single convolution algorithm,this method cangreatly utilize the sparsity to affect the performance of GPU calculation.Sparseconvolution layers and dense convolution layers respectively adopt Direct SparseConvolution algorithm and Lowering algorithm.2)Direct Sparse Convolution algorithm implemented on GPU platform.Accordingto the architecture characteristics of GPU,parallel optimization strategy basedon TILED and BLOCKED is implemented.The optimization strategy makes fulluse of the sparsity of the data and the network structure to allocate threads for taskscheduling,and utilizes the locality of the data to manage memory replacement.Among them TILED and BLOCKED have achieved significant accelerationeffects.Through TILED optimization,compared with cuBLAS,we can achieve the acceleration ratio of 1.07 *-1.23 *,1.17 *-3.51 *,1.32 *-5.00 * on AlexNet,GoogleNet,ResNet.And compared with cuSPARSE,we can achieve the speedup of 1.31 *-1.42 *,1.09 *-2.00 *,and 1.07 *-3.22 * on Alex Net,GoogleNet,and ResNet,respectively.In addition,BLOCKED showed a better optimization effect on the basis of TILED,which improved the performance of 38.1%,27.8%,18.6% and 18.1% respectively at each layer of AlexNet.

Keywords/Search Tags:

Convolutional Neural Network, Graphics Processing Unit, Sparsity, Parallelism, Optimization

PDF Full Text Request

Related items

1	Research Of Programming Analysis And Parallelism Based On Graphics Processing Unit
2	The Improvement And Parallel Optimization Of Image Super-Resolution Using Convolutional Neural Network
3	Design Of Graphics Processing Unit For Embedded Systems
4	The Research And Design Of Universal Processing Engine In Embedded Graphics Processing Unit
5	Design And Realize The Ip Core Of 3d Embedded Graphics Processing Unit Based On Fpga
6	Design And Realize The IP Core Of 3D Embedded Graphics Processing Unit Based On FPGA
7	Design Of Convolutional Neural Network Processing Unit Model Based On SystemC
8	Research And Implementation On Graphics Pipeline Of Graphic Processing Unit
9	The Optimization And Implementation Of X.264 Video Encoder On The Platform Of Graphics Processing Unit
10	Research On Acceleration Method Of Deep Convolutional Neural Networks Based On Hybrid Parallelism