In remote sensing image interpretation,object detection needs to be more intelligent and accurate.In recent years,deep learning,especially the convolutional neural network(CNN),has been widely used in computer vision due to its powerful feature learning capability.The application of CNN to object detection in remote sensing images is one of the research hotspots,and has made remarkable achievements.However,object detection in remote sensing images is still faced with some challenges,such as the diversity of object shapes,the complexity of the background,and the rotation of objects.In addition,the detection efficiency is also a problem that needs to be paid attention to in the application.For object detection in remote sensing images,this paper studies the object detection methods based on CNN.These methods are proposed to enhance the feature maps,improve the detection accuracy and efficiency.The main research work of this paper is as follows.1.To use the contextual information in remote sensing images,the multi-object information and scene information are fused with the original region features,respectively.The fusion makes the regional features contain more contextual information.The method of fusing multi-object information weighs the regional features by generating the regionto-region graph that indicates the relationship between different regions.The weighted features are used for fusion with the original features to enhance the multi-object information.In the method of fusing scene information,the pooled scene features are fused with the original features,and the cascade structure is then used to increase the detection accuracy.Both methods use Det Net as the backbone network to avoid excessive pooling of features.The experimental results on NWPU VHR-10 dataset show that the two contextual information fusion methods can effectively improve detection accuracy.2.In view of the diverse object shapes,complex scene and uneven object distribution in remote sensing images,the proposed method combines the global context and generates adaptive anchor boxes based on the weight map.The improved method can adapt to the objects of different shapes and distributions.Specifically,the global context module is first applied to enhance the features of the object regions while reducing the influence of the background.The generated weight map is then used to obtain the diverse and adaptable anchor boxes by combining with the guided anchoring method.Finally,the modulated feature adaptation module is employed in transforming the feature maps to adapt to the diverse anchor boxes.The experimental results on DIOR datasets show that the method can generate adaptive anchor boxes and improve the detection accuracy.3.Two anchor-free detection methods are proposed to improve the detection efficiency.These methods are combined with attention mechanism,and predict objects through keypoint detection and set prediction,respectively.Therefore,they do not need the anchor boxes and multiple regression.(1)Center Net method is based on keypoint detection,and it can avoid the dependence on the anchor boxes,multiple regression and regional feature extraction.Firstly,the attention mechanism is used to enhance the feature maps of the object regions.Then,the method to generate radius of Gaussian kernel is proposed,and the generated radius is used to generate the keypoint labels.Finally,the predicted boxes are determined by the predicted center points and shapes of the object boxes.(2)The DETR method is based on Transformer.This method regards the object detection as the set prediction,there is no need for any basic information for regression,including anchor boxes.DETR uses the attention mechanism in the Transformer structure to continuously enhance the correlation between the object regions,and then uses the fully connected network to directly predict the object boxes.CIo U is adopted to compute the matched cost to improve the training speed.Experiments on DIOR datasets show that the proposed methods can achieve similar or even better results than the region-proposalbased methods,and the detection speeds are faster than them.4.In view of the rotated objects in remote sensing images,a detection method based on Retina Net is proposed to detect the rotated objects.First,a variable used to predict the rotation angle in Retina Net is added.Then,the cascade structure with feature refinement module is used to improve the detection accuracy and solve the mismatch between the predicted feature and the rotated boxes.Finally,Reg Net X,which is obtained by combining artificial design and neural architecture search,is used as the backbone network to improve the effectiveness of feature extraction in the network.The results on the DOTA dataset show that the cascade structure with feature refinement module can effectively improve the performance of Retina Net,and the method based on Reg Net X is better than that based on Res Net in the case of similar calculations and parameters. |