Font Size: a A A

Pedestrian Multi-Object Tracking With Transformer And Enhanced Association Strategy

Posted on:2024-05-06Degree:MasterType:Thesis
Country:ChinaCandidate:X Y WangFull Text:PDF
GTID:2568306944950009Subject:Electronic information
Abstract/Summary:PDF Full Text Request
Due to balanced tracking accuracy and speed,joint detection and embedding(JDE)tracking paradigm has been a typical MOT method,learning object detection and ID embedding simultaneously in a unified framework,which can achieve joint optimization among subtasks and reduce the computation cost.However,MOT remains many challenges due to frequent occlusions,similar appearance and interaction among multiple objects,changes in scale caused by pedestrian movement.Current JDE methods still have two drawbacks:(1)Most JDE trackers ignore global information including interactions among target-target and target-background,which are crucial cues for data association.(2)Data association algorithms rely heavily on the performance of detectors and the similarity matrix representation is not accurate enough,which lead to identity switches when objects are occluded due to interlacing.Firstly,in order to solve problems caused by large scale changes of objects and similar appearance,Transformer is introduced to JDE framework,utilizing global information to augment the feature representation and adapt to changes in scale and appearance of objects.CSP Transformer module is proposed to enhance the feature extraction,which is used to replace the convolution layers in the final block of backbone.In backbone,firstly using convolutions to effectively learn low resolution feature maps from large images,then using multi-head selfattention with relative position encodings to process and aggregate the local feature maps captured by convolutions.CSP Transformer module can enhance the feature extraction of backbone while reducing the computation cost.Besides,multi-scale Transformer are the first to be introduced into MOT.This paper constructs a lightweight Scale-aware Transformer module,which can aggregate multi-scale feature maps and produce scale-aware global information to augment feature representation.Therefore,Scale-aware Transformer module can improve the robustness of multi-scale objects so that the tracking performance is indirectly improved without heavy computation cost.Secondly,in order to solve false associations caused by occluded objects,an enhanced association strategy is proposed to reduce the impact of unreliable detection boxes and imprecise association under occlusion.The detection boxes are divided into low score detection boxes and high score detection boxes according to the confidence threshold to conduct hierarchical matching,making full use of the location information of low score detection boxes.Besides,in order to decrease the influence of fragmented trajectories and identity switches caused by occlusion,detection recovery mechanism is used to recover these unreliable detections utilizing the temporal consistency of appearance features.Furthermore,the enhanced data association strategy adopts a more accurate similarity matrix representation: CIo U distance,and the CIo U-Embedding similarity matrix is re-designed to reduce the complexity of appearance measurement.The enhanced data association strategy effectively improves the accuracy of matching while maintaining the tracking speed.The experimental results show that the proposed method has competitive detection performance.
Keywords/Search Tags:Multi-Objects Tracking, Joint Detection and Embedding, Transformer, Multi-Head Self-Attention, Data Association
PDF Full Text Request
Related items