| Underwater object detection is one of the key technologies in the visual perception system of autonomous underwater robots.It utilizes image information collected by sensors to quickly and accurately identify objects,providing necessary basis for subsequent object tracking and positioning.This article focuses on underwater objects such as sea cucumbers and sea urchins.Firstly,the current status and significance of underwater object detection research,as well as the composition of deep learning object detection algorithm network structure,are introduced.Based on this,the difficulties of underwater object detection are analyzed,and research is conducted from the backbone feature network and multi-scale feature fusion aspects.A deep learning based underwater object detection algorithm model is proposed and trained,Improved the accuracy of underwater object detection,provided new solutions for ocean object capture,and provided new technical assistance for oceanographic object research.The main work of this article is as follows:(1)Firstly,improvements are made to address the issues of underwater environmental water turbidity,poor lighting,and low detection accuracy of small-scale targets in longrange photography using the ATSS(Adaptive Training Sample Selection)algorithm.First,improve the backbone feature network of the original algorithm,and design two modules with cavity convolution to insert into the network structure of Res Net-50 to enhance the Receptive field of the model;Secondly,for the three feature maps C3,C4,and C5 of the feature pyramid network,three convolutions with different sizes of expansion rates were used for processing,and top-down and bottom-up feature fusion was performed to generate features with rich detail information and strong robustness.(2)In response to the problem of some small target detection failures in the improved algorithm mentioned above,a multi-scale balanced feature fusion module and an attention based Conv2 Former module were proposed to add to the feature pyramid network.Firstly,for the feature pyramid network,feature filtering is performed before fusing adjacent layer features to highlight relevant feature information and suppress irrelevant background information,thus enabling better fusion of information from each layer;Secondly,the Conv2 Form module is added to the feature pyramid network.After repeated calculations and experiments,a large convolution core of appropriate size is finally selected.The main body of the convolution modulation module formed by multiplying it with other convolutions can achieve local information interaction.The Transformer module can capture the global Semantic information in the image.The convolution operation and the Transformer module cooperate and complement each other to achieve information interaction,In order to better detect small targets. |