Research On Dynamic Anchor Frame Target Detection Method Based On Attention Mechanism

Posted on:2023-12-17

Degree:Master

Type:Thesis

Country:China

Candidate:S Q Geng

Full Text:PDF

GTID:2558306902480454

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Object detection is an indispensable and important technology in the era of artificial intelligence,the purpose is to imitate human visual perception to obtain the object area of interest in the image.At present,object detection has had a significant impact in the fields of video structuring,autonomous driving,and video content understanding.Although relying on the rich representation capabilities of Convolutional Neural Networks(CNN),which more complex convolution operations and larger-scale models have significantly improved the performance of object detection.However,because the convolution treats all feature pixels equally,although the global concept of the image is modeled,it is difficult to connect the concept of distant space without considering the importance of the content of the image itself.At present,object detection methods are mainly divided into one-stage and two-stage,based on anchor-based and anchor-free detectors.The main difference between the two is how to perform label assignment,most of the detectors mainly rely on artificial prior knowledge to sample positive and negative samples.,but the appearance of the target object varies greatly in different scenes and categories.Based on the above sampling method,it cannot cover the different distributions of the categories.When faced with brand-new data,various parameters need to be re-adjusted,which reduces the generalization ability of the model.The transformer structure has become the mainstream structure in natural processing tasks,and its success is mainly attributed to the transformer’s self-attention mechanism,but its application in visual tasks is still limited.In this thesis,we proposes a dynamic anchor frame target detection method based on attention mechanism.First,a non-convolution backbone network pyramid visual Transformer is introduced.Compared with the limitations of the convolution operator’s receptive field,the visual transformer can use global context information from shallow to deep.At the same time,it is proposed that the pyramid structure can provide multi-scale features,so that the model can be extended to different visual tasks such as object detection.Second,using dynamic position encoding to adapt to input sequences of different lengths and improve the generalization ability of the model.Third,a coarse-to-fine visual transformer is proposed,which limits the scope of attention by introducing a global-to-local attention structure to reduce the computational overhead of fine-grained image tokens while maintaining the ability to receive global information.Finally,this thesis proposes a dynamic anchor frame allocation method,abandoning the method of manually setting hyper-parameters,and fits the quality score of the anchor frame to a probability distribution,assign anchor frames based on the maximum likelihood estimation of the probability distribution,and dynamically select positive and negative training samples.Experiments have proved that the object detection method based on the dynamic anchor frame regression proposed in this thesis has a significant improvement in prediction accuracy compared with the traditional convolutional neural network.This is because the visual Transformer can be modeled the relationship of the global context,which extracting more representative and robust features.At the same time,based on the method of dynamic anchor box regression,the positive and negative training samples are dynamically divided according to the characteristics of the object itself.While improving the detection performance,the generalization ability of the model is improved,so that it can be migrated to other datasets without the need to perform additional parameter adjustments.

Keywords/Search Tags:

Object Detection, Attention Mechanism, Vision Transformer, Label Assignment

PDF Full Text Request

Related items

1	Research On Densely Packed Object Detection Algorithm Based On Label Assignment And Attention Mechanism
2	Research On Object Detection Algorithm Based On Multi-scale Feature In Complex Scene
3	Research On Object Detection Based On Vision Transformer
4	Research On RGB-D Salient Object Detection Based On Depth Perception And Fusion
5	Research Of Single-Stage Object Detection Algorithms Based On Deep Learning
6	Research On Transformer-based Object Detection With Local And Global Interaction
7	Design And Implementation Of Video Relationship Detection Algorithm Based On Attention Mechanism
8	Research On Object Tracking Algorithm Based On Attention And Transformer
9	Research On Object Detection Algorithm Based On Feature Pyramid Fusion And Attention Mechanism
10	Three-Dimensional Object Detection Based On Deep Learning