Font Size: a A A

The Acceleration And Compression Of Convolutional Neural Networks

Posted on:2018-03-05Degree:MasterType:Thesis
Country:ChinaCandidate:W J ChenFull Text:PDF
GTID:2348330536978593Subject:Engineering
Abstract/Summary:PDF Full Text Request
Artificial intelligence has swept the academia and industry in the whole world,where the deep neural network plays a pivotal and crucial role.It firstly made a great breakthrough in the field of Computer Vision,followed by the field of Speech and Natural Language Processing and so on.Its high precision in many applications has gradually meet the standards of industrialization and production,which owes to its deep structure and vast amounts of training data.However,these two characteristics lead to its redundant model size and computational complexity,which are the obstacles in the process of industrialization and production.Therefore,it is of great academic value and engineering significance to do compression and acceleration for such redundant model.In this paper,we focus on the structure of convolutional neural network,and come up with several methods about compression and acceleration based on some existed methods in related fields.In order to remove the redundancy within a basic structure,we present a hybrid compression and acceleration method especially for neural network of the pattern like“convolutional layers + full-connection layers”.We suggest making a low-rank approximation of convolutional matrix and sparsing full-connection layers with pruning method,which followed by a quantization to the full network.Compared with other works focusing on network compression,our main contribution lies on network acceleration.Based on network compression,we propose how to implement network acceleration with matrix decomposition,network pruning and quantization both in algorithm and system level.Combined with some engineering technique,this paper takes Chinese character recognition network as an example to have a verification.However,this method only remove the redundancy within the basic structure instead of the redundancy caused by the unreasonable composition of the network.Therefore,we propose to redesign a light network and introduce some tricks about how to design a light network,and take a real-time neural style network to have a verification.However,it is too difficult to find a balance between the model capacity and the training dataset.Empirically,this tipping point need multiple attempts,which is indeed resource consuming.Therefore,it is necessary to design a method to speed the training process of light network.In order to solve it,we propose a multi-layer teacher-student learning method based on knowledge transfer,which decreases optimization iteration and reduces training time compared with training a light network from scratch.This multi-layer teacher-student learning method is to transfer the knowledge from redundant teacher network to a structure-differently light student network,whose core is to cut an end-to-end task into several easier sub-tasks to learn the mapping relationship among intermediate feature maps,which narrows the parameter optimization space and speeds up the network convergence.Furthermore,the design of student network is a most crucial step in this method.In this paper,we design different student networks according to their corresponding teacher networks,and take FlowNet,ResNet as examples to verify the speed and convergence of knowledge transfer.What’s more,we connect the multi-layer teacher-student learning method with low-rank approximation as well as network quantization in this paper in order to dig out further compression and acceleration.
Keywords/Search Tags:the compression and acceleration of convolutional neural network, low-rank approximation, pruning, network quantization, knowledge transfer, teacher-student learning
PDF Full Text Request
Related items