| In the 21 st century,massive remote sensing images continue to emerge.Research on the object detection technology based on remote sensing image has made great contributions to the fields of weather prediction and military management.However,standard object detection models are all proposed for natural images.Compared with natural images,remote sensing images have the characteristics of high density,large proportion of small objects,and huge difference in size of objects.Therefore,traditional models cannot be directly applied to object detection in remote sensing images.The main research content of this thesis is to deeply study the characteristics of remote sensing images.Based on the standard Twostage network,the model is refined to improve the detection performance of the network for remote sensing image objects,finally obtained 80.6 mean average precision(m AP)in the DOTA v1.0,the overall ranking is 9/324,the main contributions of the thesis are as follows:(1)Determined the backbone network suitable for the feature extraction for remote sensing image.The backbone network is the core component of the object detection model.In order to obtain a backbone network that performs well in the field of remote sensing images,this thesis deeply explores the operating mechanism of the residual network.Through ablation experiments and result analysis,it is determined to select Res Ne Xt101-64 d as the backbone of the detection model,which finally increases the m AP of the detection model by 2.3.(2)Introduced and improved the rotating object detection module.In this thesis,the rotating object detection mechanism is introduced into each stage of the two-stage network,and the bounding box regression based on relative offset is used instead of the traditional horizontal bounding box regression.Roi Transformer—a rotating region proposal module is improved in the first stage of the network,and by adding 2 fully connected layers,the non-linear expression ability of the module is enhanced.The rotated bounding box regression module is introduced in the second stage of the network,which greatly improved the model’s detection effect for dense objects in remote sensing images.(3)Proposed a new instance-level denoise module named Enhanced-In LD.This thesis introduces a instance-level denoise module named In LD,the receptive field of the module is improved by adding the dilated convolution groups,the deformable convolution is introduced to enhance the ability of spatial feature perception in the module,and a feature enhancement network based on the residual structure is designed to improve the module’s performance for instance-level denoise capabilities.A new module named Enhanced-In LD is proposed.Finally,the m AP of the detection model is increased by 0.6.(4)Used multi-level attention mechanism to enhance the robustness of the model.In terms of backbone network,a brand network GDRes Ne Xt101-64 d is proposed by adding Gc Block and deformable convolutions.In the feature fusion network,the BFP module is brought in for the FPN network to act on the feature map from the Enhanced-In LD module after denoising.The Non-local module provides a global context for the feature map,which further refines the quality of the feature map.The multi-level attention mechanism improves the model’s ability to remote sensing in both the feature extraction and feature fusion stages.The detection m AP of the model is increased by 0.6 and 1.2 respectively.(5)Used multiple data augmentation methods to increase sample diversity.This thesis brings in two data augmentation strategies of multi-scale training and testing and Mixup for the detection model,which expands the amount of data and improves the diversity of sample data distribution.Finally,the detetion m AP is increased by 0.8 and 1.6 respectively. |