Font Size: a A A

Research On Multi-scale Object Detection Method Based On Deep Learning

Posted on:2024-02-14Degree:MasterType:Thesis
Country:ChinaCandidate:Y Q HouFull Text:PDF
GTID:2568307115955829Subject:Electronic information
Abstract/Summary:PDF Full Text Request
With the continuous breakthroughs in deep neural network technology,deep learning techniques are more and more widely used in various fields,and deep learning has become the leading direction for researchers to explore artificial intelligence.Among them,object detection,as a very important task within the field of deep learning,is widely used in unmanned vehicles,face recognition,aerospace and other frontier fields,and has significant research value.Based on the basic theories of deep learning and computer vision,object detection networks need to accurately classify and localize all objects in an image.Ideally,similar to human vision,object detection networks are able to correctly detect all objects for input images of arbitrarily complex scenes.Currently,multi-scale detection is one of the challenges to be solved for object detection.The existing object detection models based on convolutional neural networks have insufficient ability to detect objects of different sizes within the same scene,the reason mainly includes the perceptual field of the convolutional kernel limits the flexible perception of scales;the information of small and medium-sized objects is easily corrupted and lost;and the cross-scale feature interaction capability is weakened due to insufficient analysis of semantic information of objects.To address the above problems,a backbone network and a neck network are designed for the multi-scale object detection task in this paper,Transformer is adopted for deep mining of semantic information in images,and graph neural networks are added to ensure cross-scale interactions and information integrity.In this paper,we design a Transformer-based multi-scale object detection backbone network.A cross-scale embedding layer is used in the network to initially embed the image features,where the input is downsampled using multi-branch null convolution,and the structure is made to have diverse perceptual fields by adjusting the expansion rate of parallel branches.The output embedding results are then processed by the residual self-attention module to construct connections for local and global information of the feature map,so that the attention calculation incorporates effective multi-scale semantic information and finally achieves multi-scale object detection.The models are trained on datasets such as COCO,and the experimental results show that the method has significant advantages over other object detection methods.In this paper,we design a multi-scale object detection neck network based on graph feature fusion.The feature splitting module is used to disassemble and reconstruct the input image into a graph structure,set up edge connections for both the same level and related nodes of different levels to maximize the association and circulation of semantic information,and complete the conversion of the image in the traditional Euclidean space to non-Euclidean space;then,through the graph attention fusion module,the spatial attention mechanism and the channel attention mechanism are used to simultaneously Finally,the original features from the backbone network are fused with the features calculated by graph attention,and finally input to the detection head to complete the multi-scale object detection task.After experimental verification,the evaluation indexes of this network are better than other advanced networks.
Keywords/Search Tags:object detection, multiscale, null convolution, Transformer, attention mechanism, graph neural network
PDF Full Text Request
Related items