| With the rapid development of deep learning,object detection has made great breakthroughs.Object detection has been widely used in many fields,including intelligent monitoring,medical image and autonomous driving.Although the general algorithm of object detection can efficiently complete the detection of conventional ground images,the performance of the algorithm will be significantly decreased when it is directly applied to the vehicle detection in aerial images.The main reasons include that the target occupies relatively few pixels in the aerial images,the target scale changes greatly,and the background is complex.Therefore,vehicle detection in aerial image is extremely challenging.In this thesis,a series of innovative researches are carried out for vehicle detection in aerial image.Based on YOLOv5,some improvements are made to make it more suitable for vehicle detection in aerial image.The main contributions of this thesis are summarized as follows:In order to solve the problem that the features of aerial images are not obvious due to poor lighting conditions,context fusion module,attention aggregation module and self-refinement module are proposed in this thesis.Context fusion module uses parallel convolution layers to fuse contextual information with different receptive fields and capture global contextual information by global average pooling.This information is necessary for the detection in aerial images with obscure target features and complex scenes.The attention aggregation module adaptively aggregates multilevel features and intensifies features with higher recognition ability by using the attention mechanism.The self-refinement module further refines and enhances the feature mapping.The three modules are embedded into YOLOv5 to form YOLO-CAR.The experimental results show that YOLO-CAR has better performance than YOLOv5 for vehicle detection in aerial image.In this thesis,a deconvolutional feature fusion module inspired by super resolution technology is proposed to solve the problem that objects of aerial images occupy few pixels,which contains up projection unit and down projection unit.Deconvolutional feature fusion module can reduce the loss of key information in the process of down-sampling from high-resolution feature map to low-resolution feature map,and generate feature information conducive to detection in the process of sampling from low-resolution feature map to high-resolution feature map.This module makes full use of low-level features and high-level features,so that the integrated feature map contains abundant spatial information and abundant semantic information.The module is embedded into the YOLOv5 to form YOLO-DUM.The experimental results show that YOLO-DUM has better performance than YOLOv5 for vehicle detection in aerial image. |