The applications of object detection,such as face recognition on smartphones and automatic scanning of courier codes,have already become integrated into our daily lives.In contrast,object detection in remote sensing images often takes place in places we are not familiar with,such as wildlife conservation,disaster relief,and national security monitoring.These applications are closely related to our safety,and therefore require higher accuracy and efficiency.In this study,after researching mainstream detection models,we chose the Swin Transformer based on the Transformer as the main structure,combined with a feature fusion encoder and task alignment encoder to construct a detection network.We also made improvements to the backbone network,the representation of rotated boxes,and the loss function,achieving an improvement in detection effect and accuracy.The main contributions of this study are as follows:(1)Remote sensing images often contain numerous small objects,and detecting them requires stronger feature extraction capabilities.Swin Transformer uses window-based attention mechanism instead of global attention mechanism to reduce computational complexity,but it also loses contextual information of the data.Although a mobile window attention mechanism has been added,the improvement is limited.To solve this problem,this article designs a feature enhancement module to help select representative data between each window and calculate global and channel attention to increase the contextual information in the features.The feature enhancement module is added to Swin Transformer to improve the contextual information in the features.(2)In remote sensing image object detection,targets such as airplanes,ships,and cars often appear at various angles and in clusters.To reduce the risk of falsely deleting overlapping objects,rotated bounding boxes are used instead of horizontal bounding boxes.However,the rectangular coordinate system’s representation of rotated boxes is too complex and not conducive to model training.To simplify the representation of rotated boxes,polar coordinate representation is adopted,and combined with polar ring area loss function,the polar coordinate loss function is designed in this article to solve the problem of separate calculations between angle and polar radius in the loss function without linkage.(3)Experimental validation was conducted using the DOTA dataset,achieving an m AP of 74.21%.A comparative analysis was performed with state-of-the-art models,and ablation experiments were conducted.Our proposed method exhibits excellent detection performance on small targets in remote sensing images.It also demonstrates higher detection accuracy in complex scenarios with multiple clustered objects.These results provide evidence that our proposed method is more suitable for remote sensing image object detection tasks. |