| In recent years,with the rapid development of remote sensing(RS)imaging technology,the acquisition of high-resolution RS images has become more convenient.As a crucial part of intelligent interpretation of RS images,object detection in RS images aims to accurately detect objects of interest from large-scale images and identify their categories correspondingly.This task plays an important role in many practical applications in military and civilian fields.However,real-world application scenarios are often complex,involving RS images with large size and complex background,and objects exhibiting large variations in shapes and orientations.The performance of object detection methods in these complex scenarios of RS images still needs to be improved.Therefore,using the object detection framework based on deep learning,this thesis conducts in-depth research into the issues faced by detection systems in different scenarios regarding the characteristics of RS images and objects,and proposes corresponding solutions.The main research results in this thesis can be summarized as follows:1.A cascade rotated detector is proposed for ship detection in RS images,improving the detection accuracy of ships with large aspect ratios in large-scale RS images.The distribution of ship targets in large-scale remote sensing images is relatively sparse and uneven,and existing detection methods usually perform the same processing on each area in the image,resulting in low computational efficiency and false detections.In addition,it is difficult for existing methods to accurately locate ship targets with large aspect ratios and arbitrary orientations.Therefore,this thesis proposes a cascaded rotated detector for ship detection in large-scale RS images,which adopts a progressive prediction strategy to gradually improve the detection accuracy of ship targets.This method designs a cascade structure.It first uses a data preprocessing module to identify potential target areas within the image,and then gradually improves the localization accuracy of rotated bounding boxes through a cascade refinement module in a coarse-to-fine manner.2.A detector with adaptive feature matching and enhancement is proposed for object detection in RS images,improving the detection accuracy of multi-class remote sensing objects with arbitrary orientations.Existing detection methods do not fully consider the shape and orientation characteristics of objects in RS images during label assignment,which may lead to inappropriate feature matching,thereby affecting the optimization of the model.In addition,standard convolution operations are difficult to accurately capture the features of arbitrary-oriented objects,and the mutual influence between different object categories also hinders the learning of high-quality features.Therefore,this thesis proposes a detector with adaptive feature matching and enhancement for object detection in RS images.This method adopts a new soft label assignment strategy,which can adaptively adjust the weight of samples according to the matching degree between features and ground-truth objects,thereby guiding the optimization of the model.Meanwhile,an oriented feature refinement module is designed to utilize the geometric information of rotated objects,enhancing the representation ability of features.Furthermore,a class-aware context aggregation module is introduced to enhance the discernibility of features among different object categories.3.A task-specific heterogeneous network is proposed for object detection in RS images,improving the orientation estimation accuracy of objects with arbitrary orientations in RS images with complex background.The recognition and localization tasks in object detection have distinct emphases.Specifically,the recognition task focuses more on the semantic information of objects,while the localization task focuses on the boundary information.However,remote sensing images often contain complex backgrounds which may introduce a large amount of interference for object recognition.Additionally,previous angle-based regression methods struggle to accurately estimate the orientation of objects with arbitrary directions,thereby affecting precise object localization.Therefore,this thesis proposes a task-specific heterogeneous network for object detection in RS images,which uses different structures to learn the feature representations required for recognition and localization tasks respectively.Considering the characteristics of remote sensing images,the recognition branch introduces an interference suppression module to suppress both background and inter-class interferences,while the localization branch utilizes adaptive keypoint representation to obtain precise boundary information of objects with various shapes and directions.To further enhance the correlation between classification confidence and localization accuracy,this thesis proposes a joint-learning quality estimation module,which effectively integrates classification and localization features to predict more accurate localization quality.4.A refined hybrid network is proposed for object detection in RS images,improving the detection performance of objects with large aspect ratios.Object detection methods based on convolutional neural networks typically rely on dense spatial prior information to generate a large number of candidate boxes,and then remove redundant detection results through nonmaximum suppression post-processing.As the predicted classification confidence cannot accurately reflect the real localization accuracy of candidate boxes,the post-processing will inevitably suppress some correct detection results and affect the recall rates,especially for objects with large aspect ratios.Therefore,this thesis proposes a refined hybrid network for object detection in RS images,which can significantly improve the recall rate of objects in RS images.Specifically,this method first uses the dynamic query generation module to generate a set of sparse object queries regarding the input image,and then uses the query decoder to refine the object queries.During the refinement stage,one-to-one matching is utilized to remove the post-processing.In addition,an adaptive feature fusion module and a mixed query sampling strategy are designed to enhance the training of the query decoder. |