Font Size: a A A

Research On Deep Neural Network Compression And Acceleration Based On Channel Pruning

Posted on:2023-08-11Degree:DoctorType:Dissertation
Country:ChinaCandidate:H J ChengFull Text:PDF
GTID:1528307061974289Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Since the proposal of the perceptron and the back propagation,artificial neural networks(ANNs)have become a more and more powerful tool in artificial intelligence with successful applications.In the past decades,as an improvement of traditional ANNs,the deep neural networks(DNNs)have drawn considerable research interests from both academic and industrial communities and achieved great success in a variety of application areas such as recognition,signal and information processing,etc.However,with DNN models becoming complex and the increase of model parameters and computation costs,it brings great challenges to deploy deep neural networks on mobile devices and promotes the researches of network compression to develop vigorously.The goal of network compression is to reduce network parameters and speed up network inference without reducing the model performance.The existing network compression researches face the problem of high time cost and lack of multi-task network compression study.This paper meets the challenges of both single-model compression and multi-model compression,and mainly studies on efficient network compression algorithms in deep neural networks from three aspects:1)removing the redundant information of filters in the network;2)removing the redundant filters in the network;3)removing the redundant structures of the network.The main contributions and research results of this dissertation are as follows:(1)A deep weighted sparse network based on multi-objective optimization is proposed to adaptively remove the redundant information of filters in the network.Traditional sparsity usually impose the same sparsity constraints on all neurons in the network,which may cause some high activated filters to lose useful information.Moreover,some low activated filters with redundant information will remain the redundant information under the constraints.In this paper,the weighted sparsity constraint is introduced into DNN models to reduce the redundant information of filters more effectively and to force the network concentrating the effective information into partial filters,which is helpful to the network compression;A multi-objective optimization model is established to adaptively select dynamic hyperparameters of sparsity constraint weights;Updating network for evaluation is divided into two parts,reconstruction error optimization and sparse gradient calculation,to reduce the time costs of updating;The time costs are further reduced by adopting hyperparameter-sharing strategy in training process.Experimental results show that compression models with higher precisions can be obtained by pruning the weighted sparse network models with the same compression degree.(2)A multi-task pruning algorithm based on filter index sharing is proposed to remove redundant filters in multi-models.Most of the existing network pruning methods perform on a single network structure,that is,building a deep neural network model on a single database,and then carrying out pruning operations.In this paper,it is considered that similar features can be extracted from models on multiple databases.Sharing these filters can not only realize model compression,but also improve the performance of compressed models through information interaction among different databases.Criteria-based multi-model compression algorithms are often faced with the challenges of designing filter importance measurement criteria,selecting operations of filters,allocation of multi-model compression degree,restoration of multi-model precisions and expansion of compression algorithms.In order to overcome these difficulties,this paper designs the filter importance metrics for multi-model compression,and selects important filters by multi-objective optimization;A filter sharing strategy is proposed to adaptively determine the operation(pruning,merging or keeping)of each filter in the network;The index matrix is set up to store the corresponding operations and to guide the precision recovery of compressed models;To solve the problem of multi-model compression degree allocation,this paper provides two different ways to set the multi-model compression degree.Experimental results show that the proposed algorithm perform well in single-model compression,two-model compression and multi-model compression.After finetuning,each compressed model of the multi-model compression algorithm has higher accuracy than model of the single-model compression algorithm under the same compression degree.(3)A novel differentiable channel pruning algorithm guided via attention mechanism is proposed to optimize the pruning strategy.Neural architecture search(NAS)algorithms can make full use of network structure information.However,the NAS algorithms often face the problems of large search space and slow search speed.To overcome these shortcomings,Gumbel-softmax sampling is introduced to make the optimization process of the pruning strategy differentiable,and the attention score is used to provide prior information for the NAS strategy optimization.Considering that directly introducing attention module in networks will lead to the increase of the parameters and computation costs,a two-stage iterative training method is designed to alternately optimize the attention module and the network parameters,so that the attention module can be removed from the model without affecting the accuracy of the model.Special NAS strategies are designed for the network modules with shortcut,such as ResNet,hence the network can be compressed at both width and depth levels.Finally,the algorithm is extended to multi-model compression.Experimental and visualization results show the effectiveness of the layer information and the proposed algorithm can obtain compression models with lower accuracy loss.(4)A compact multi-task network based on hypernetwork is proposed and the parameters of multi-task compact models are generated by the hypernetwork.In order to fully make use of the correlation information of different tasks and solve the problem of retraining costs after multi-task model compression,a hypernetwork is designed to generate network parameters adaptively.The hypernetwork consists of two modules:task-specific feature extractors and a task-shared parameter generator.Specifically,the task-specific feature extractors will extract the corresponding features for the tasks,which will be used as the input of the parameter generator.Then,the parameter generator provides the network parameters for the multi-task models.Since different tasks in multi-task learning are related,the parameter generator can effectively learn the mapping relationships between features and network parameters.Two methods for selecting features of compact multi-task model are provided:directly using mean feature vectors of tasks or using evolutionary algorithm to select feature vectors.The hypernetwork will automatically generate the network parameters without fine-tuning or re-training the multi-task models,which can greatly reduce the training time costs.The multi-task experiment results show that the parameters of the multi-task network with the same compression degree and better performance can be obtained by the proposed hypernetwork.
Keywords/Search Tags:Deep neural network, Network compression, Multi-objective optimization, Multi-task learning, Pruning, Neural architecture search, Hypernetwork
PDF Full Text Request
Related items