Font Size: a A A

Deep Learning Based Aerial Image Target Detection Algorithm Research

Posted on:2024-07-22Degree:MasterType:Thesis
Country:ChinaCandidate:L M ZhaoFull Text:PDF
GTID:2542307061472044Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
In recent years,drones have become more and more popular in people’s daily lives,and they are used in large numbers in many scenes due to the limitations of objective reasons,such as: drone warning,drone capture,drone persuasion,etc.Drones are gradually becoming irreplaceable in complex scenes.UAV aerial images have practical application and research value in the fields of military detective,emergency,intelligent transportation,agricultural nuisance,etc.With the continuous updating of technology,UAV aerial photography technology is becoming more and more mature,the spatial resolution of aerial images is getting higher and higher,and the detailed information of images is getting richer and richer,thus laying a solid foundation for the research of small target detection,multi-scale target detection and target detection in complex scenes in aerial images.As computer computing power and deep learning-based target detection algorithms are constantly updated and iterated,deep learning-based target detection algorithms for aerial images have also become a research hotspot in the direction of target detection.Since aerial images are very different from the generic target detection dataset,compared with the generic target detection dataset,aerial images mainly have characteristics such as a large proportion of small targets,large differences in target sizes,and similar background and target information,etc.According to the problems in aerial images,we need to optimize the original target detection algorithm in order to get better detection results.In this paper,we improve on the generic target detection YOLOv5 and propose a multi-scale UAV aerial image target detection algorithm based on coordinate attention and Swin Transformer,and then improve the feature extraction module C3 and feature fusion network based on the original YOLOv5 algorithm.The main work of this paper is as follows:(1)For aerial photography images with a high proportion of small targets,at the same time,because aerial photography images are captured in the form of overhead imaging,resulting in a large number of targets being obscured or incomplete pixel information and other problems,making the original YOLOv5 algorithm mentions a high rate of false detection and missed detection.To address this problem,a tiny target detection layer is added to the original algorithm.Secondly,for the bounding box regression task of the prediction head,the original GIOU loss function is replaced by the Alpha-IOU loss function,which improves the detection performance of the network model.(2)For the problems of large target size differences and complex background information in aerial images,coordinate attention and Swin Transformer modules are introduced on the basis of the original YOLOv5 algorithm.Coordinate attention decomposes channel attention into two one-dimensional feature encoding processes,which aggregate features along two spatial directions,respectively.In this way,remote dependencies can be captured along one spatial direction,while accurate location information can be retained along the other spatial direction.The resulting feature maps are then individually encoded into a pair of orientation-aware and position-sensitive attention maps,which can be complementarily applied to the input feature maps to enhance the representation of the object of interest.In this paper,we introduce a coordinate attention module in the Backbone part of YOLOv5,which enables the network to extract feature information more efficiently,ignore unimportant information such as background information in aerial images,and focus more on the target information to be detected in the images.In this paper,we introduce a hierarchical Transformer,which is computed with a shifting window.Swin Transformer brings higher efficiency by using a self-attention mechanism within the window,which allows the network to better focus on the region of interest and therefore ignore background information.Also,by using a shift window,there is an information interaction between two adjacent windows and there is a cross-window connection between the upper and lower layers,thus achieving a global modeling effect in disguise.The Swin Transformer module is introduced in the Neck part of YOLOv5,which can effectively capture the global information of the image,get rich contextual information,solve the problem of multi-scale targets,and at the same time,can effectively extract the target information in the image and ignore the background information.(3)Considering that the detection effect obtained by using generic target detection algorithm in aerial images is poor.To address this problem,this paper starts from the feature extraction and feature fusion part of the network structure.In the feature extraction Backbone part,the original YOLOv5 algorithm uses the C3 module,which mainly draws on the idea of CSPNet extraction triage,and combines the idea of residual structure.In this paper,the C3 module in the Backbone part is replaced by the C2 f module,which refers to the C3 module and the ELAN idea used in YOLOv7.The ELAN module can obtain richer gradient information,so that the feature information of the target can be extracted more efficiently.In order to make the network more efficient in fusing the feature information extracted by Backbone,this paper replaces the PAFPN module in Neck with GFPN(Generalized FPN)in DAMO-YOLO,which can fully exchange high-level semantic information and low-level spatial information.
Keywords/Search Tags:Deep learning, Aerial images, Objection detection, Resdiual networks, YOLOv5
PDF Full Text Request
Related items