| Remote sensing is a very different target detection environment from natural scenes,and the applicable models on it requires the addition of angle information or other bounding box correction information in addition to the output of the target category and rectangular bounding box.Traditional target detection models need to be adapted and improved based on the specific characteristics of remote sensing targets during their transfer learning to remote sensing image target detection tasks.Based on the characteristics of remote sensing targets,we explored the applicability,fusion and improvement of current advanced deep learning algorithms on remote sensing detection tasks in terms of improving detection speed and detection accuracy.The specific works are as follows:(1)Processing and enhancement of the DOTAv1.0 dataset.Firstly,fourdimensional parameter conversion and five-dimensional parameter conversion were performed on the original four-coordinate point labels for horizontal anchor frame clustering and model training,respectively.Then based on the remote sensing image and remote sensing target characteristics,the effectiveness of three data enhancement methods,namely HSV enhancement,random flip enhancement and Mosaic enhancement,on the remote sensing target detection task was analyzed and validated.(2)Oriented-YOLOv5 is proposed by fusing the CSL angular output method on YOLOv5(v5.0).This experiment is conducted on the basis of YOLOv5,incorporating the advanced CSL angular output method and making modifications mainly in the network head,Io U calculation method and loss function.The CUDA-accelerated rotating rectangular frame NMS method is introduced to better fit the dense scenario of remote sensing targets.Overall real time detection speed and high accuracy are achieved on large size remote sensing images(1024 x 1024 pixels)detection tasks by increasing a few parameters and computation,or by using other accelerations.The fastest Oriented-YOLOv5 s detection speed was 85 FPS with 68.24% m AP.(3)This paper continues to explore ways in which speed and accuracy can be improved based on Oriented-YOLOv5 s.Replacing the Focus layer with a convolutional layer can improve inference speed and overall accuracy.Designing spatial pyramidal void convolution instead of spatial pyramidal pooling can enhance the representational power of the model.Introducing stochastic multi-scale training to improve the robustness of the model.The Si LU activation function is introduced to improve the model detection speed and accuracy.Several methods with no boosting effect but reflecting the model characteristics are also found during the experiments and analyzed in the paper.(4)The SWIN-Oriented R-CNN is proposed.a new structure is investigated to improve the accuracy of the Oriented R-CNN using the SWIN backbone network instead of the Res Net backbone network in the Oriented R-CNN.Based on this,the enhancement effect of multiple pyramid fusion structures on the Transformer backbone is further explored to improve the effective representation of the model,with m AP improving to 74.1%.The SWIN-Oriented R-CNN model also performs well on the large fine-grained remote sensing dataset FAIR1 M,with m AP reaching 41.13%.Oriented-YOLOv5 enables the application and enhancement of lightweight realtime detectors for remote sensing target detection tasks,reducing the hardware requirements of remote sensing detectors.SWIN-Oriented R-CNN achieves further enhancement of high-precision remote sensing target detectors and provides the application of Transformer backbone networks in remote sensing image target detection algorithms.It provides a baseline.The research results of this paper not only extend the fusion capability of the latest deep learning algorithms,but also help to enhance the practical application of remote sensing image detection algorithms. |