| Object detection in arbitrary-direction scenes has emerged as a significant area of research in recent years.Compared to general scenes captured from a horizontal perspective,such scenes contain more detailed information due to the randomness of the direction.This has implications for a wide range of applications including traffic guidance,urban planning,marine rescue,and military deployment.However,due to the large-scale variation,the concentrated distribution of objects,the complex background environment,and the presence of more small targets,detecting objects in arbitrary directions is more challenging than detecting objects in horizontal scenes.To address the issue of traditional down-sampling pyramid networks being unable to correctly extract object features due to wide variations in object sizes,an up-sampling pyramid network was added to Rep Points V2.The pyramid pooling structure was employed to connect two variable atrous convolutional networks,fully leveraging feature information for multi-scale feature fusion.Additionally,the backbone network was replaced with a Swin-Transformer with an attention mechanism.Its window communication mechanism was utilized to establish windows between different layers so that the feature map of each window could interact with the feature map of the corresponding window at other layers,thus achieving multi-scale information fusion for small object feature extraction.Experimental verification on the DOTAV1 dataset demonstrated significant performance improvement with the proposed method.In the detection process of arbitrary direction objects,problems such as misclassification and missing scores are easily caused due to the different distribution of object aspect ratios,when a fixed Io U threshold is set for positive and negative sample allocation.To solve these problems,a dynamic adaptive positive and negative sample allocation algorithm SADT based on object size has been proposed in this thesis.This algorithm enables the Io U threshold to be adaptively changed according to the sample aspect ratio during sample allocation,so as to achieve a more accurate positive and negative sample allocation.Through experimental analysis of the DOTAV1 dataset,it was shown that the algorithm can effectively improve the detection accuracy of the detection model.To solve the problems of boundary discontinuity,square-like problem,and high sensitivity of loss function caused by periodic changes in object angle,a Rotated Bounding Box regression loss function GHD has been proposed in this thesis.The object box in rotating object detection is first transformed into a Gaussian distribution,and then the Hellinger distance between Gaussian distributions is transformed into a rotated bounding box regression loss function suitable for network learning.This solves the boundary discontinuity problem to some extent and reduces the sensitivity of the loss function.Additionally,a distributed focusing loss function with separate prediction angles is added to the overall loss function which solves the squarelike problem to some extent.The experimental results on DOTAV1,DOTAV1.5,and DOTAV2 datasets show that the proposed GHD and overall loss function are effective. |