| Object detection is a research hotspot in the field of computer vision.The main task is to find the location of all the objects of interest in the image or video,and give the specific category of each object.In recent years,object detection has had many mature applications for many computer vision fields,such as vehicle automatic driving,image retrieval,video surveillance and information collection.Object detection algorithms based on traditional image processing and machine learning usually use hand-designed features and use small sample for training.This is often affected by factors such as lighting,occlusion,and environmental changes,which ultimately leads to poor object detection.Compared with the traditional object detection algorithm,the object detection algorithm based on deep learning has outstanding performance advantages,but deep learning also has some shortcomings.Although the high-complexity deep learning model has better performance,the high storage space and computing resource consumption make it difficult to effectively apply to mobile and embedded devices with limited size and power consumption.The huge amount of computation causes the neural network model to not run in real time on mobile,embedded devices,etc.In response to this problem,many scholars in the industry study model compression and acceleration algorithms to eliminate redundant information of neural networks.Therefore,this thesis proposes to apply the model compression technology to the SSD(Single Shot Multi Box Detector)object detection model,reduce the memory footprint of the object detection model,speed up the inference and save energy.The specific work content is as follows:1)The construction and compression of the backbone network model.In this thesis,the SSD backbone network is changed to Densely Connected Convolutional Networks(Dense Net).Densenet performs high-and low-level feature fusion operations in each Dense Block.The high-low layer feature fusion is suitable for object detection.The data set used in the backbone network training is imagenet2012.After the training is completed,the backbone network model is compressed.Using the structured pruning method,the importance of each channel of the feature map is evaluated according to the parameter γ of the network batch normalization layer.The unimportant feature map channel is cut off,and the corresponding convolution kernel is also clipped.Finally,the network classification accuracy is improved by fine-tuning the pruned network.The experimental results show that when the parameters of backbone network model are reduced by half by using structured pruning method,the network has no loss of accuracy.2)Combine the compressed backbone network model with SSD efficiently to build a CPSSD model.This thesis uses the smaller 6 scale(19 × 19,10 × 10,5 × 5,3 × 3,2 ×2,1 × 1)feature maps for prediction.In order to further improve the inference speed of the algorithm,the number of default boxes is reduced on the feature map with scale 19 ×19.The feature maps with scales 19 × 19 and 10 × 10 are the original ones of the backbone network.For generating additional 4 feature maps,this thesis uses depthwise separable convolution,which decomposes the standard convolution into deep convolution and point convolution.This decomposition can effectively reduce the amount of calculation and reduce the size of the model.In the final prediction,the 1 × 1 convolution kernel is used to replace the 3 × 3 convolution kernel to predict the class score and the position of the boxes.In order to balance the inference speed and accuracy of the model,a residual block(Res Block)is constructed before each feature map for detection is predicted.The algorithm training in this thesis uses the training set and verification set of PASCAL VOC2007 and 2012,and then tests on the test set of PASCAL VOC2007.The experimental results show that the proposed algorithm can reduce the size of SSD model by 2.8 times,maintain the mean Average Precision(m AP)of the model without loss,and improve the detection speed of the model. |