| With the advancement of object detection technology and the advent of hardware acceleration devices,such as Graphics Processing Units(GPUs),the realm of object detection has shifted from a mere theoretical pursuit to practical industrial applications.Despite this,real-world scenarios present a multitude of challenges that impede the effectiveness of object detection algorithms.The complexity and diversity of the environment contribute to problems such as significant changes in target scale,interference from background noise,and the challenge of effectively deploying largescale network models.Therefore,constructing effective algorithmic models that address these challenges has become an urgent requirement.This thesis aims to investigate the challenges of practical object detection and proposes a simple yet efficient algorithmic model.Specifically,the study focuses on exploring key technologies,including target feature extraction,attention mechanisms,and multi-scale feature fusion,in object detection models,to strike a balance between detection accuracy and speed.The research findings presented in this thesis are summarized as follows:(1)Aiming at the balance between model accuracy and speed in practical application of target detection,the research of target detection based on structural reparameterization and attention mechanism is carried out.Firstly,the method decouples the training and reasoning process of the target detection model by using the structure re-parameterization technology,so as to reduce the complexity of the algorithm model in the reasoning process;Then,a multi-dimensional attention mechanism is proposed.Firstly,the channel attention module is constructed,and the ability of convolutional neural network to capture feature information is effectively improved by modeling the importance of each channel in the feature map.Then,a non-local spatial attention module is constructed,which makes use of the dependence between the area where the target feature is located and the contextual information of different distances around it to increase the network’s perception ability of spatial location information and alleviate the loss of spatial information.This method can effectively balance the detection accuracy and speed of network model without increasing too many parameters.(2)Aiming at the problem that the target scale changes greatly in target detection,the target detection research based on multi-scale attention supervision fusion is carried out.Firstly,a multi-scale receptive field module is designed to enhance the feature extraction ability of the feature layer for targets of different sizes;Then the attention supervision fusion module is designed,which is mainly divided into aggregation and attention supervision.In the aggregation part,the fusion between different scale feature layers is constructed.In the attention supervision part,the corresponding weight is generated for the semantic information of each feature map to be fused by calculating the importance of the channels of the adjacent scale feature layers,so as to realize the weighted fusion process of feature maps and further improve the feature extraction ability of the network for different scale targets. |