| Deep convolutional neural networks have facilitated the development and implementation of various computer vision tasks,which have drastically changed people’s ways of producing and living.In recent years,there has been an increasing demand for terminal AI applications,such as autonomous driving and industrial anomaly detection.Terminal device computing and storage capacity is subject to many limitations due to device size,economic cost,energy consumption,reliability,and other factors.But in contrast,the model size of modern deep neural networks is getting bigger and bigger,and the hardware requirements are getting higher.Although the accuracy of the model is also rising,this limits its application to resource-constrained devices.On the one hand,some high-precision models are difficult to deploy in these end devices due to size constraints.On the other hand,some terminal applications require high real-time performance,and the inference speed of the models is the main bottleneck limiting their application.Therefore,it is of great practical importance to compress and accelerate deep neural networks while maintaining model accuracy.Common neural network compression methods include structural pruning,weight pruning,weight quantization,and knowledge distillation,which have their own advantages and disadvantages.Structural pruning has the advantages of being hardwarefriendly,simple and effective,and easy to implement.It is also one of the most important research directions in the field of network compression.Although the network structural pruning technique has achieved great success,there are still some problems to be solved,such as the fact that many pruning methods are greedy algorithms based on heuristic or empirical design,which lack sufficient theoretical support;the complexity of pruning algorithms is too high.To address these problems,this thesis starts with the most intuitive phenomena and proposes the corresponding solutions step by step.Specifically,the main research contents and contributions of this thesis are as follows:(1)A structural pruning method based on feature shift minimization is proposed to identify the redundancy in neural networks from a new perspective,and its effectiveness in network pruning is verified through extensive experiments.Firstly,we analyze the reasons for the "abnormal" changes in the pruning and network accuracy curves and propose the concept of feature shift.The positive correlation between it and model accuracy is proved through experiments,and then it is proved that it can be used in network pruning.Then,in order to reduce the complexity of calculating the feature shift,a feature shift estimation algorithm is proposed,and its feasibility is confirmed by detailed ablation experiments.Finally,this thesis proposes a feature distribution optimization algorithm to partially recover the accuracy loss caused by pruning without training by using the computed feature shift information,which improves the overall efficiency of the pruning algorithm.(2)A global automated pruning method for neural networks based on the true contribution of features is proposed,which solves the limitation that the algorithm proposed in(1)needs to artificially set the compression rate of each layer and proposes a new pruning criterion from the perspective of the network as a whole to improve the applicability and effectiveness of the algorithm.First,this thesis starts from three intuitive classifications of the output features of each layer,reveals that the feature information transfer process will occur in different degrees of loss,quantitatively analyzes the degree of this information loss,and further experiments prove that there is a certain stability to this information loss.Then,based on this,a new global pruning criterion is proposed by calculating the impact of output changes in a channel on the output of subsequent network layers.Finally,a set of optimization strategies is designed to reduce the complexity of the algorithm,and the effectiveness of the algorithm is demonstrated by extensive experiments. |