Research On Transformer-based Object Detection

Posted on:2024-06-04

Degree:Master

Type:Thesis

Country:China

Candidate:Q Wang

Full Text:PDF

GTID:2568307142957859

Subject:Control Science and Engineering

Abstract/Summary:

PDF Full Text Request

Transformer is widely used to solve object detection problems.However,compared with CNN-based detection models,the detection models of Transformer generally suffer from slow convergence speed.To solve this problem,this paper designs model transformation experiments and concludes through analysis that the main factors affecting the convergence speed of Transformer detection models are region-specific sparse sampling,spatial prior and multi-scale input.In this paper,the detection model of Transformer is improved from the above three influencing factors,respectively.First,this paper integrates region-specific sparse sampling and relative position coding to make lightweight improvements to the attention mechanism of Transformer,and combines spatial a priori prediction to propose a Transformer detection model based on pre-filtered attention.Secondly,based on the proposed model,we further extend its feature extraction,Transformer module and prediction network parts in multiple scales,which greatly shortens the convergence time of Transformer detection model and also achieves the improvement of detection accuracy.The main research work and results of this paper as follows:(1)By comparing the detection models of CNN and Transformer,we found that the convergence speed of the Transformer-based detection model is slow.In order to solve the convergence problem,this paper designs model transformation experiments to obtain the main factors affecting the convergence speed of Transformer detection model are region-specific sparse sampling,spatial prior and multi-scale input,which provide the theoretical basis and improvement ideas for the research of this paper,and the subsequent work combines the three main influencing factors to optimize the Transformer detection model.(2)To address the problem of slow convergence of Transformer detection model,this paper proposes a object detection based on Transformer with prefiltered attention using the idea of region-specific sparse sampling and spatial prior.The model changes the way the original Transformer processes images with a lightweight attention module to reduce the computational effort and save training time.At the same time,a directed relative position encoding is proposed to compensate for the lack of relative position information caused by the attention calculation.Second,the model uses relative offset regression bounding box to reduce the learning difficulty.Experiments on the COCO dataset show that this improved idea successfully accelerates model convergence and relieves the pressure of global modeling.(3)Multi-scale extension of the object detection based on Transformer with prefiltered attention.Firstly,hybrid multi-attention is introduced to construct multi-scale feature inputs to make full use of image features.Secondly,the pre-filtered attention is extended multi-scale to achieve feature fusion and processing.In addition,joint regression loss is proposed to quickly stabilize the regression bounding box and finally establish an accurate and efficient detection model.Experiments on COCO and Cityscapes datasets demonstrate the advantages of the model in improving model convergence speed and accuracy.In summary,in view of the slow convergence of Transformer detection model,this paper proposes a multi-scale Transformer detection model based on prefiltered attention by integrating the ideas of region-specific sparse sampling,spatial prior and multi-scale input with the conclusion of model transformation.This model is proposed,which solves the slow model convergence caused by the global modeling of attention mechanism in the original Transformer and improves the detection accuracy.The advantages of the detection model studied in this paper in improving the convergence speed and accuracy are demonstrated through a large number of experiments.

Keywords/Search Tags:

object detection, Transformer, prefiltered attention, relative position coding, multi-scale feature

PDF Full Text Request

Related items

1	The Technical Research On Multi-scale Object Detection In YOLO Detection Models
2	Research On Small Target Detection Method Based On Multi-scale Feature Fusio
3	Research On Object Detection Algorithm Based On Deep Convolutional Neural Network
4	Research On Multi-scale Object Detection Method Based On Deep Learning
5	Research On Object Detection Algorithm Based On Multi-scale Attention And Densely Connected Network
6	Real-time Object Detection Based On Multi-scale Neural Network And Self-attention Mechanism
7	The Research Of Image Super-resolution Reconstruction Algorithm Based On Transformer
8	Research Of Saliency Object Detection Algorithm With Multi-Module Fusion Based On Deep Learning
9	Research On Attention Mechanism And Multi-scale Feature Fusion Method For Object Detectio
10	Research On Lightweight Object Detection Algorithm Based On Multi-scale Feature Fusion