| In recent years,deep learning has developed very rapidly and deep neural networks have better model performance in multiple research fields.However,the complexity of models is also getting higher,the number of model parameters are getting larger.The models require amounts of data for training.Because of structure characteristics,model has a large number of redundant parameters and needs high computing resources,which hinder further mobile terminal deployment and inference.At present,model compression methods,such as model quantization,can be used to reduce information redundancy and model complexity.While reducing the complexity of the model brings performance advantages,it will inevitably cause loss of accuracy,and one of the key points is how to make up for the loss of accuracy of the model as much as possible while reducing the complexity of the model.This thesis focuses on the use of model quantization and other model compression methods for convolutional neural networks.The main contents and contributions of this thesis are as follows:Aiming at the problems of information redundancy of model parameters,high loss of direct model quantization,this thesis proposes a progressive quantization process optimization method based on knowledge distillation.This method uses a full precision convolutional neural network as the teacher network and a quantized convolutional neural network as the student network,it uses the idea of knowledge distillation to conduct teacher-student guided training through relative entropy or feature map transfer methods and uses progressive ways such as model progressive quantization,progressive quantization level,component gradual quantization.And through multi-dimension analysis and experimental results,and compared with other methods,the proposed method effectively keeps the performance of the original compression model while reducing the loss of quantization model accuracy.It also provides research ideas for next model quantization optimization methods.Aiming at the problems of information redundancy caused by the structure,large model parameters,and high loss of quantization accuracy of the compact model,this thesis proposes other progressive quantization optimization method based on low-rank decomposition.This method uses matrix decomposition of the original convolution layer to split it into multiple quantized convolution layers,and uses the form of tensor decomposition to tensorize the the fully connected layers for balancing the parameters of network and the accuracy of model.And the selection of the hyperparameter for example,the rank of the weight matrix does not require careful adjustment.In addition,in order to quickly apply the theory of low-rank decomposition to the research of deep learning,the low-rank decomposition toolkit is designed and developed to include the basic mathematical operations required for low-rank decomposition and various low-rank decomposition algorithms.The toolkit is highly modular and can be embedded in deep learning development framework and experimental testing and analysis,the toolkit is tested and analyzed experimentally.Finally,the proposed method is tested on multiple datasets and multiple tasks,and has good experimental results.Compared with advanced methods,we can see the advantages of the proposed method,it also provides research ideas for further compression of the neural network model. |