Font Size: a A A

Model Compression And Implementation Based On Information Measurement

Posted on:2023-04-21Degree:MasterType:Thesis
Country:ChinaCandidate:L S ShaoFull Text:PDF
GTID:2568306812464274Subject:Computer technology
Abstract/Summary:PDF Full Text Request
At present,deep convolutional neural networks have become the mainstream algorithms in the field of artificial intelligence applications.However,the computational cost is huge,and the processing capability of edge devices is difficult to meet the real-time requirements of the algorithm,which is not conducive to the practical engineering application of the algorithm.Generally,the problem of difficult deployment of edge devices such as mobile phones and robots is solved by improving the efficiency of the algorithm.As one of the methods to improve the efficiency of the algorithm,model compression can effectively reduce model redundancy and make it comparable or even better performance than the origin model,so that the compressed model can easily deploy on resource-constrained devices.Therefore,model compression has become one of the hot topics today.Pruning is one of the most widely used model compression methods at present,and its advantages are as follows: 1)The pruned model is structured,also known as structured pruning,which can be well supported by regular hardware and off-the-shelf basic linear algebra subprogram(BLAS)library.2)The storage usage and computational cost are significantly reduced in online inference.3)It can be further combined with other compression mthods,such as network quantization,low-rank factorization,and weight pruning,to achieve a deeper compression and acceleration.Although many pruning works have achieved certain results,there are still two essential problems in the pruning algorithm:(1)The pruned network structure is related to the pre-layer pruning rate.Different layers have different pruning rates.Setting these pruning rate for different layers has shown to signigicantly affect the final performance,and there is no deterministic algorithm for how to determine the pre-layer pruning rate.(2)The filter importance measurement identifies which filters in the pre-trained model should be preserved and inherited to initialize the pruned network structure.The weights of different filters are initialized and the pruned network shows different performance effects.There is currently no in-depth study on how to determine filter importance.In view of the above problems,this paper focuses on the filter pruning for image classification and applies it to the actual target detection task.The main research work is divided into the following three parts:1.A pruning method based on feature map information measurement and a parallel pruning method derived from it are proposed.The pruning method based on the feature map information measure reflects the importance of the corresponding filter by exploring the information in the feature map.On the premise that the richer the feature information is,the more important the filter corresponding to the feature map is,the information entropy of the feature map is used to measure its feature information.The smaller the entropy,the richer the information contained in the feature map,so as to evaluate the current corresponding filter.And use normalization to achieve cross-layer comparison,so that the network can perform disposable pruning.The parallel pruning algorithm is a pruning algorithm formed by combining the advantages of multiple pruning methods,which makes the network better than the single pruning method in terms of parameters and FLOPs.The experimental results show that: the pruning method based on feature map information measurement can reduce 55.3% parameters and 55% FLOPs of Res Net50 with only 4.13% the accuracy loss on Image Net.On CIFAR10,Dense Net40 prunes 64.2% parameters and 61.8% FLOPs with only 0.22%accuracy drops,while the parallel pruning method removes 65.1% parameters and 65.5%FLOPs from Dense Net40 with only 0.51% accuracy drops.2.A pruning method based on filter similarity is proposed.Taking filters as the research object,the similarity is used as the evaluation basis to reflect whether the filters can replace each other.The higher the similarity,the higher the probability that the filter can be replaced by another filter,and it is considered as a redundant filter for pruning.The main steps of the pruning method are as follows: First,the L1 norm,a common pruning importance evaluation method,is used to select the target filter.Then,the Euclidean distance is used to measure the similarity relationship between the filters.A filter with a higher similarity to the target filter is considered to be replaced by the target filter,so it should be pruned.On the CIFAR10 dataset,this pruning algorithm derives a compact Res Net56 with 0.53 M parameters and 79.34 M FLOPs while the accuracy drop only 1.28%.And it is successfully applied to face detection and recognition task,which proves that the algorithm has certain effectiveness.3.Research on real-time deployment technology of pruning algorithm was carried out,and three pruning methods were applied to practical tasks such as single-stage YOLO series object detection network and two-stage Faster R-CNN object detection network.Among them,on the VOC dataset,when the pruning rate of the parallel pruning method is 70%,the accuracy of the YOLOv3 network after pruning is only0.067 lower than that of the original network,and the detection speed reaches 25 FPS,which is 14 FPS higher than that before pruning.For YOLOv4 and Faster R-CNN networks,network-accelerated inference can also be accomplished with little loss of accuracy.For the self-built drone dataset,the YOLOv3 network is used to achieve realtime object detection,and it is deployed on the TX2 hardware platform.Combined with the Tensor RT tool,the speed reaches 24 FPS.The experimental results confirm that the pruning method proposed in this paper can be successfully applied to the actual embedded platform,which has certain effectiveness and practicability.
Keywords/Search Tags:Model compression, Filter pruning, Deployment, Object detection, TX2
PDF Full Text Request
Related items