Nowadays,with the advent of artificial intelligence,unmanned aerial vehicle(UAV)has been applied in many fields.For example,the integration of UAV and object detection technique has played a significant role in such sectors related to people’s livelihood as emergency rescue,disaster relief,aerial work,traffic monitoring,agricultural development and news reporting.Due to the problems of multi-scale variation of targets and the large proportion of small targets in UAV aerial images,it is difficult for object detection models designed for natural scenes to directly detect targets in UAV aerial images.Based on this,the main work of this paper is as follows:(1)To address the problem of multi-scale variation of targets in UAV aerial images,a Feature Enhancement Module(FEM)and a Feature Enhancement Network(FENet)based on the Module are designed in this paper.Specifically,the FEM is a multi-branch residual structure consisting of different-size convolutional kernels,which facilitates the Network to extract features with high discriminative power through efficient extraction of features from the targets and the input of their multi-scale features in the next stage.Immediately after visualizing each channel of the feature map,it is found that there are similarities between the feature channels.In order to reduce the computational effort of the network model without affecting the detection accuracy,a dilation operation generating similar feature channels mainly through low-cost convolution is designed in this paper.In addition,to solve the problem of the large proportion of small targets in UAV aerial images,a Guided Attention Module(GAM)is proposed and embedded into the feature pyramid structure in this paper.This Module provides guidance for the shallow feature map based on the weight generated by deep semantic features,which enhances the feature fusion effect while improving the detection effect of small targets by the network model on the shallow feature map.(2)To address the problems of complex anchor parameter tuning and high computational overhead of the network due to the large number of preset anchors in anchor-based object detection network,a MultiStrategy Interactive Network(MSINet)based on the Fully Convolutional One-Stage Object Detection(FCOS)is purposed in this paper to complete the detection task of UAV aerial photography targets.Specifically,a contribution label strategy is proposed to introduce the quality information of the predicted bounding box into the training of the classification branch,which solves the problems caused by the inconsistent usage of the classification branch and the Center-ness branch in the training and testing stages,and quality evaluation of the predicted bounding boxes of positive samples only by the Center-ness branch in the training stage.Since contribution labels in continuous form are beyond the application of Focal Loss(FL),an Improved Focal Loss(I-FL)is designed to meet the need for using contribution labels as the supervisory signal for classification tasks.Finally,a dynamic sample weighting strategy is proposed,which generates corresponding weight factors according to the contribution labels of different samples,allowing the network model to focus on high-quality samples at training stage,thus improving its overall detection performance. |