Font Size: a A A

Research On Anchor-Based Object Detection Method For Full Transformer Framework

Posted on:2024-09-01Degree:MasterType:Thesis
Country:ChinaCandidate:F ChenFull Text:PDF
GTID:2568306923452264Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the continuous advancement of deep learning and computer vision research,Transformer models have surpassed or even completely surpassed the performance of convolutional neural networks in several directions in the field of computer vision.Among them,object detection is a key research direction in computer vision,with broad market prospects in important areas such as surveillance,violation detection,and medical image analysis.Given the recent development of convolutional neural networks,the current mainstream object detectors have relatively limited diversity in terms of their structures and frameworks,mainly adopting either fully convolutional neural network structures or a hybrid structure combining convolutional neural networks with Transformers.There is a lack of object detectors with a complete Transformer architecture.Due to the superior potential and performance exhibited by Transformers compared to convolutional neural networks,this thesis explores a fully Transformer-based object detection network,called AnchorFormer,with the aim of further advancing Transformer methods.To address the aforementioned issues,this thesis conducts the following research:It designs an object detector called AnchorFormer,which adopts a fully Transformer architecture.The Transformer encoder is used as the backbone network for feature extraction,while the Transformer decoder serves as the prediction head for object detection.While maintaining the unchanged encoder-decoder structure of Transformers,this thesis proposes a novel object detection method based on anchor points and anchor boxes.By redesigning the prediction head of the decoder and introducing fixed-region one-to-one predictions,anchor points,and anchor boxes as prior conditions and inductive biases,the fully Transformer object detector achieves good performance even on small to medium-sized datasets.The convergence of the Transformer object detector is accelerated by adding point-wise score loss to the category prediction branch.Additionally,this thesis investigates two different approaches,namely anchor-free and anchorbased methods,to explore the influence of anchor points and anchor boxes on the Transformer object detection model.New sample matching methods and conflict resolution solutions are designed for these two approaches.Furthermore,this thesis introduces a new non-convolutional feature fusion module called Layer Merging between the Transformer encoder and decoder to replace the role of convolutional neural network feature fusion modules such as feature pyramid networks and path aggregation networks.Multiple sets of controlled experiments are conducted in this thesis to demonstrate the effectiveness and superiority of the proposed object detector and detection methods.The experiments provide evidence for the acceleration of model convergence through point-wise score loss.The effectiveness of the Layer Merging module,which can replace feature pyramid networks and path aggregation networks for feature fusion,is verified through comparisons.The benefits of point-wise score loss for model convergence and the suitability of confidence loss for object prediction are demonstrated through comparisons between point-wise score loss and confidence loss.By comparing the CIOU Loss with the LTRB method,the problem of high dependency on the predicted object center for small objects is mitigated,resulting in a 1.8%improvement in AP_s on the COCO dataset.Comparative ablation experiments are conducted between the fully Transformer object detector proposed in this thesis and the baseline DETR.Under the same experimental conditions on the VOC0712 dataset with small to medium-sized data,the proposed model achieves a 6%improvement in mAP,and it converges after 30 training epochs,while DETR requires 150 epochs for convergence,resulting in an 80%improvement in convergence speed.
Keywords/Search Tags:Transformer, DETR, Anchor Point, Anchor Box, Object detection
PDF Full Text Request
Related items