Research On Anchor-Based Object Detection Method For Full Transformer Framework

Posted on:2024-09-01

Degree:Master

Type:Thesis

Country:China

Candidate:F Chen

Full Text:PDF

GTID:2568306923452264

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the continuous advancement of deep learning and computer vision research,Transformer models have surpassed or even completely surpassed the performance of convolutional neural networks in several directions in the field of computer vision.Among them,object detection is a key research direction in computer vision,with broad market prospects in important areas such as surveillance,violation detection,and medical image analysis.Given the recent development of convolutional neural networks,the current mainstream object detectors have relatively limited diversity in terms of their structures and frameworks,mainly adopting either fully convolutional neural network structures or a hybrid structure combining convolutional neural networks with Transformers.There is a lack of object detectors with a complete Transformer architecture.Due to the superior potential and performance exhibited by Transformers compared to convolutional neural networks,this thesis explores a fully Transformer-based object detection network,called AnchorFormer,with the aim of further advancing Transformer methods.To address the aforementioned issues,this thesis conducts the following research:It designs an object detector called AnchorFormer,which adopts a fully Transformer architecture.The Transformer encoder is used as the backbone network for feature extraction,while the Transformer decoder serves as the prediction head for object detection.While maintaining the unchanged encoder-decoder structure of Transformers,this thesis proposes a novel object detection method based on anchor points and anchor boxes.By redesigning the prediction head of the decoder and introducing fixed-region one-to-one predictions,anchor points,and anchor boxes as prior conditions and inductive biases,the fully Transformer object detector achieves good performance even on small to medium-sized datasets.The convergence of the Transformer object detector is accelerated by adding point-wise score loss to the category prediction branch.Additionally,this thesis investigates two different approaches,namely anchor-free and anchorbased methods,to explore the influence of anchor points and anchor boxes on the Transformer object detection model.New sample matching methods and conflict resolution solutions are designed for these two approaches.Furthermore,this thesis introduces a new non-convolutional feature fusion module called Layer Merging between the Transformer encoder and decoder to replace the role of convolutional neural network feature fusion modules such as feature pyramid networks and path aggregation networks.Multiple sets of controlled experiments are conducted in this thesis to demonstrate the effectiveness and superiority of the proposed object detector and detection methods.The experiments provide evidence for the acceleration of model convergence through point-wise score loss.The effectiveness of the Layer Merging module,which can replace feature pyramid networks and path aggregation networks for feature fusion,is verified through comparisons.The benefits of point-wise score loss for model convergence and the suitability of confidence loss for object prediction are demonstrated through comparisons between point-wise score loss and confidence loss.By comparing the CIOU Loss with the LTRB method,the problem of high dependency on the predicted object center for small objects is mitigated,resulting in a 1.8%improvement in AP_s on the COCO dataset.Comparative ablation experiments are conducted between the fully Transformer object detector proposed in this thesis and the baseline DETR.Under the same experimental conditions on the VOC0712 dataset with small to medium-sized data,the proposed model achieves a 6%improvement in mAP,and it converges after 30 training epochs,while DETR requires 150 epochs for convergence,resulting in an 80%improvement in convergence speed.

Keywords/Search Tags:

Transformer, DETR, Anchor Point, Anchor Box, Object detection

PDF Full Text Request

Related items

1	Research On Two Stage Rotating Object Detection Algorithm Based On Anchor Free Frame
2	Improving Anchor Boxes And Object Representations For General Object Detection
3	Research On Two-Stage And Anchor-Free Object Detection Algorithm
4	Research And Application Of Object Detection Method Based On Anchor-free
5	Research On Object Detection Algorithm Based On Anchor-free
6	Research On One Stage Object Detection Algorithm Based On Anchor-free
7	Research Of Object Detection Based On Anchor Free
8	An Improved Object Detection Model Based On Feature Enhancement And Anchor Optimization
9	Research On Network Model And Algorithm Of Video Object Detection Based On Anchor-free
10	Research On Object Detection Algorithm Based On Anchor Boxes And Loss Function