Font Size: a A A

Research On Dynamic Anchor Frame Target Detection Method Based On Attention Mechanism

Posted on:2023-12-17Degree:MasterType:Thesis
Country:ChinaCandidate:S Q GengFull Text:PDF
GTID:2558306902480454Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Object detection is an indispensable and important technology in the era of artificial intelligence,the purpose is to imitate human visual perception to obtain the object area of interest in the image.At present,object detection has had a significant impact in the fields of video structuring,autonomous driving,and video content understanding.Although relying on the rich representation capabilities of Convolutional Neural Networks(CNN),which more complex convolution operations and larger-scale models have significantly improved the performance of object detection.However,because the convolution treats all feature pixels equally,although the global concept of the image is modeled,it is difficult to connect the concept of distant space without considering the importance of the content of the image itself.At present,object detection methods are mainly divided into one-stage and two-stage,based on anchor-based and anchor-free detectors.The main difference between the two is how to perform label assignment,most of the detectors mainly rely on artificial prior knowledge to sample positive and negative samples.,but the appearance of the target object varies greatly in different scenes and categories.Based on the above sampling method,it cannot cover the different distributions of the categories.When faced with brand-new data,various parameters need to be re-adjusted,which reduces the generalization ability of the model.The transformer structure has become the mainstream structure in natural processing tasks,and its success is mainly attributed to the transformer’s self-attention mechanism,but its application in visual tasks is still limited.In this thesis,we proposes a dynamic anchor frame target detection method based on attention mechanism.First,a non-convolution backbone network pyramid visual Transformer is introduced.Compared with the limitations of the convolution operator’s receptive field,the visual transformer can use global context information from shallow to deep.At the same time,it is proposed that the pyramid structure can provide multi-scale features,so that the model can be extended to different visual tasks such as object detection.Second,using dynamic position encoding to adapt to input sequences of different lengths and improve the generalization ability of the model.Third,a coarse-to-fine visual transformer is proposed,which limits the scope of attention by introducing a global-to-local attention structure to reduce the computational overhead of fine-grained image tokens while maintaining the ability to receive global information.Finally,this thesis proposes a dynamic anchor frame allocation method,abandoning the method of manually setting hyper-parameters,and fits the quality score of the anchor frame to a probability distribution,assign anchor frames based on the maximum likelihood estimation of the probability distribution,and dynamically select positive and negative training samples.Experiments have proved that the object detection method based on the dynamic anchor frame regression proposed in this thesis has a significant improvement in prediction accuracy compared with the traditional convolutional neural network.This is because the visual Transformer can be modeled the relationship of the global context,which extracting more representative and robust features.At the same time,based on the method of dynamic anchor box regression,the positive and negative training samples are dynamically divided according to the characteristics of the object itself.While improving the detection performance,the generalization ability of the model is improved,so that it can be migrated to other datasets without the need to perform additional parameter adjustments.
Keywords/Search Tags:Object Detection, Attention Mechanism, Vision Transformer, Label Assignment
PDF Full Text Request
Related items