Font Size: a A A

Research On Small Target Detection Method Of Remote Sensing Image Based On Residual Convolutional Network And TRANSFORMER Fusion

Posted on:2024-01-30Degree:MasterType:Thesis
Country:ChinaCandidate:J L WeiFull Text:PDF
GTID:2542307142452324Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Object detection is one of the current popular tasks,which requires accurate semantic detection of images in order to achieve full analysis and utilization in the fields of military and civilian applications such as security,transportation,and rescue.However,due to the different perspective presented by remote sensing photos compared to the normal front or side views in everyday life,but rather presented in a bird’s-eye view,the objects produced in this perspective are smaller in scale and have inherent differences in orientation.When applying object detection algorithms directly from natural images to satellite images,the results are often poor.Therefore,small object detection in remote sensing images has been a focus of attention and research in both industry and academia.This article mainly focuses on the problems of low information proportion and difficult feature extraction of small objects in remote sensing images.Based on YOLO X,it proposes an improved Attention Cross-stage Transformer network(ACSTNet)and a Bidirectional Attentional YOLO X network(BAM-YOLO X)small object detection algorithm.The specific research contents are as follows:(1)To address the problem of insufficient feature information of small objects in remote sensing images,this paper adds Patch Partition to the upper layer of the backbone network,which makes the block map of each layer more detailed,cross-window connection and shifted window strategy improve the efficiency and the scope of the sensing domain,and more feature information of small objects is passed to the subsequent layers through the residual module,which enables the model to perform more intensive potential semantic exchange and increase the depth of interaction information at different levels.At the same time,a new feature output branch of 160px×160px is added to the upper layer of the dark3 network for enhancing the upper layer of low-level feature information rich in small object features,which can effectively improve the detection accuracy of small objects.Then,to address the problem of complex environmental information in remote sensing images and inconspicuous distinction between front and back backgrounds,this paper designs a parallel Swin Transformer structure to increase the depth interaction information of different kinds of feature extractors,combining the feature of convolutional neural network which is more sensitive to local feature information of images with the feature of Transformer structure which is more sensitive to the relationship information between pixels in images and global information extraction.The combination of the features of convolutional neural network,which is more sensitive to the local feature information of the image,and the Transformer structure,which is more sensitive to the relationship information between pixels in the image and global information extraction.It is demonstrated that the fusion of the convolutional neural network and the self-attentive mechanism designed in this paper significantly improves the differentiation of foreground and background by the model.(2)To address the problems of remote sensing object detection tasks with huge differences in the proportion of objects in the annotation frame,redundant background information,and most small objects presenting dense clusters that are difficult to distinguish,this paper proposes a efficient channel and space normalized fusion attention mechanism(ECSNFAM)based on the fusion of spatial,channel and batch normalized empirical weights as soft attention,which combines the feature mapping information at the neck level to better focus on the feature information of the object being detected.It is in response to the ECSNFAM structure that increases the computational effort substantially and lacks global information,this paper adds a double-ended attention module in the shallow layer of the neck network,which combines the ability of the Transformer model to extract the pixel-to-pixel relationship of the object to compensate for the lack of global focus of the convolutional neural network model on the object,and uses only two attention branches to reduce the ECSNFAM The number of parameters of the pixel-level attention mechanism is reduced,and the accuracy of the model is improved.Experimental evaluations were conducted on the DIOR and RSDO-DATA remote sensing datasets to assess the effectiveness of our proposed method.The results indicate that our method outperforms the YOLO X model,achieving an improvement of 1.2%and 1.4%in mAP0.5on the two remote sensing datasets,respectively.
Keywords/Search Tags:Small object detection, Self-attention mechanism, Residual convolution, Transformer, Convolutional attention mechanism
PDF Full Text Request
Related items