Research On Transformer-Based End-to-End Object Detection

Posted on:2024-09-24

Degree:Master

Type:Thesis

Country:China

Candidate:J Zhao

Full Text:PDF

GTID:2568307067493744

Subject:Signal and Information Processing

Abstract/Summary:

PDF Full Text Request

Object detection is a fundamental and important task in computer vision,which aims to detect the location of objects and the corresponding categories.There are many well-established detectors based on convolutional neural networks(CNN)make promising results.In recent years,transformer attracted a lot of attention from academia and industry.Thanks to the capability of modeling interrelations of global information,transformer-based detectors can make full advance of context,achieving stronger results.Region-based end-to-end transformer-like detectors,like Sparse R-CNN,perform well.This paper researches such detectors and proposes an end-to-end task-specific detector with Io U-enhanced-attention and a box-location positional encoding-based recursive detector.The contributions of this paper are listed following:(1)Due to the one-to-one interaction strategy between proposal features and proposal boxes,transformer-like detectors rely on self attention extremely.It leads to proposal features easily interacting with irrelevant ones,losing their distinctive identity and harming the performance.This paper proposes to utilize Io U as a prior to enhance self attention.The Io U matrix computed among proposal boxes multiplies the attention matrix,limiting the keys compared with query.Thus,the irrelevant ones are suppressed.Object detection consists of classification and regression,they focus on different regions of an object.The former focuses on the center,while the latter concentrates on the counters.We propose a dynamic channel weighting module to generate two channel masks by lightweight projection head,and then the two masks multiply with object features to extract suitable features for the two tasks.(2)Transformer-like detectors usually have cascade stages to progressively refine predictions to groud-truth.The cascade structure leads to a large number of parameters.This paper intends to share them across the decoder stages,then the model size can be reduced a lot with a little drop in performance.Moreover,we reuse the dynamic conv module to build an in-stage recursive structure and increase the depth of model.The bounding box positional encoding can boost the recursive detector.Positional encoding makes the decoder aware of the location and shapes of the proposal box,thus the model can be more adaptive to proposals in different stages.We further utilize centerness to help kernels and Ro I features distinguish the spatial information within the proposal box.

Keywords/Search Tags:

Deep Learning, Object Detection, Transformer, Attention, Positional Encoding

PDF Full Text Request

Related items

1	Research On Image 3D Object Detection Algorithm Based On Deep Learnin
2	Research On Feature Extraction And Online Detection Of Malicious DNS Traffic
3	Three-Dimensional Object Detection Based On Deep Learning
4	Research On Improved Object Detection Algorithm Based On Deep Learning YOLOv5
5	Research On Multi-scale Object Detection Method Based On Deep Learning
6	Research On 3D Object Detection Algorithm Based On Deep Learning
7	Research Of Facial Expression Recognition Based On Self-attention Mechanism And Geometric Deep Learning
8	Research On Attention Mechanism And Diversified Feature Extraction For Person Re-Identification
9	Research On Pedestrian Multi-Object Tracking Algorithm Based On Deep Learning
10	Label Semantics And Transformer For Meta Learning Few-shot Object Detection