| With the improvement of living standards,more and more consuming level UAVs are widely used in various fields.Although UAV has brought the more colorful life and convenience,the improper operation and unauthorized "black flying" of UAV have also brought many Security problems.To deal with the increasingly severe problem of UAV prevention,this paper is based on deep learning to visually detect dim and small UAVs with the characteristics of "low,small,and slow",so as to provide algorithm support for anti-UAV systems.The main work is as follows:1.Due to existing color visual object detection algorithms cannot recognize dim and small UAVs in real time and with high accuracy,this paper proposes an improved YOLOv4(You Only Look Once)network model to detect UAV.In order to detect dim and small objects,this model reduces some deep residual units of CSPDarknet53 to capture more fine-grained information,and constructs a backbone feature extraction network to process large feature maps.Meanwhile,the model builds an aggregate feature pyramid by moving feature map and cutting the output features of the fifth large residual module,which makes the backbone network becomes shallower.Furthermore,this model enlarges the feature map at the bottom of the pyramid by 4 times,and removes the high-level low resolution features,and then this model Fuses the high-level and middle-level feature information to realize the real-time detection of dim and small UAV objects.Finally,experimental result proves that the accuracy of the proposed improved algorithm is increased by 2.8%compared with the original algorithm,the recall rate is increased by 4.7%,and the fps is increased by 6.3.2.In order to robustly detect the UAV in video,this paper proposes a video detection algorithm by aggregating spatiotemporal information.Firstly,this model divides the input video stream into key frames and non-key frames,and computes the backbone features in the key frames to extract the feature pyramid,and only calculates the low-level backbone features in the non-key frames;This paper proposes a feature stream calculation module to aggregate the high-level pyramid features of key frames with the low-level backbone features of non-key frames,which can enhance the overall detection effect by using key frame features,and omits the extraction high-level features in nonkey frames to reduce the detection time.The results show that compared with the single-frame target detection network,the accuracy of the proposed video detection network is increased by 4.1%,the recall rate is increased by 6.3%,and the fps is increased by 5.2.3.The paper proposes a bi-modal joint learning network based on Siamese networks to improve the object detection under insufficient light conditions.This network extracts the independent features of color and infrared images via the Siamese network,and introduces a feature fusion module to process the feature maps output by each layer of the network so that the two modal features are used.Finally,this network further fuses the contributions of the fusion features and the independent features by supervised learning.Experimental results show that this proposed method can effectively improve the UAV detection results for a single model image or video under complex environments.Compared with the single-input network,the accuracy rate is increased by 2.6%,and the recall rate is increased by 5.1 %. |