| In recent years,due to the advantages of low price,small size,and easy operation,UAVs have been widely used in various industries such as aerial photography and logistics.However,while small drones bring convenience,their "overfly" and"black flying"phenomena also bring greater hidden dangers to personal privacy,public safety,and aviation safety.Therefore,domestic and foreign governments,the military,academia,and industry have carried out theoretical research on anti-UAV systems in recent years to detect,locate,and counter UAVs.Among various UAV detection methods,video detection has the advantages of low cost and strong scene adaptability,but the small size and weak features of UAVs also bring challenges to video detection.This paper designs and implements a Spatio-temporal feature fusion method for small UAV visual detection,and designs model compression and system application.The main research contents of this article include:(1)A Spatio-temporal feature fusion method for UAV video detection is proposed to solve the problem that the appearance feature of UAV is weak and the motion feature is easily disturbed by background dynamics.Firstly,the optical flow feature map for adjacent frames is calculated;then the temporal feature module composed of 3D convolution is designed to fuse the multi-frame optical flow;finally,the Spatio-temporal feature fusion mechanism is designed to introduce the temporal feature into the YOLOv3 object detection framework.Considering the high computational cost of the dense optical flow method,which is often used for motion information extraction,this paper proposes to calculate sparse optical flow for key feature points in images,and designs a sparse optical flow feature map.In addition,in order to prevent the feature points of the drone from being overwhelmed by the points of the complex background environment,this paper improves the Shi-Tomasi feature point extraction method,thereby effectively suppresses a large number of feature points in the static background and promotes the quality of temporal features.(2)Aiming at the problem that the inference of the neural network model consumes a lot of resources and the system computing resources are limited,this article solves it from multiple perspectives.For the temporal feature module,the TSM module is used for model compression,and the sparse method is used to perform channel pruning and layer pruning on the backbone network.Finally,various compression strategies are experimentally evaluated on the UAV dataset.(3)The overall system is designed and implemented.After the algorithm design,in order to train the model parameters,the UAV video database is built,and data augmentation and semi-automatic annotation method for video sequences are designed.During model training,a phased strategy is used to maximize the effect of transfer learning.When detecting the monitoring screen,in order to improve the detection effect and reduce the pressure on the system hardware resources,a screen sub-region detection mechanism is designed. |