Font Size: a A A

Research On Visual Object Tracking Algorithm Based On Deep Representation Learning

Posted on:2023-06-12Degree:DoctorType:Dissertation
Country:ChinaCandidate:K YangFull Text:PDF
GTID:1528307376485104Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Visual object tracking is fundamental research in computer image processing and aims to estimate the shapes and locations of tracking objects in video sequences.Object tracking has been widely used in augmented reality,autonomous driving,and humancomputer interaction.In recent years,significant progress has been made in object tracking methods based on deep representation learning.Region proposal network is a common object tracking method based on a deep representation learning framework that has attracted extension attention because it achieves a good balance between accuracy and speed.The current tracking models have made some progress in offline learning of local representations,online learning of local representations,and online learning of global representations.However,there are still the following problems: First,the current regional proposal network based on offline training must manually pre-design anchor boxes at the level of offline learning of local representations.This design introduces many anchor boxes,resulting in an imbalance of positive and negative samples,which will negatively impact the model’s training.Second,at the level of online learning of local representation,the online learning method based on a discriminative filter lacks real-time update of template features of the target and reasonable division of positive and negative samples in training,which shows low robustness.Third,the correlation filter methods enhanced by Transformer exist the misalignment problems of classification score and regression prediction and the effective extraction of boundary features of targets at the level of online learning of global representation.With the aim of the above problems,this thesis has made some progress in the object tracking task from offline learning of local representations to online learning of local representations and online learning of global representations,which has improved and enriched the research in the tracking field.The main research contents of this thesis include:(1)With a particular focus on the complex design of anchor boxes in the region proposal network,this thesis presents a local representation algorithm based on the Siamese corner network.The proposed Siamese method uses a modified corner pooling layer to transform the estimation of the target box into a pair of corner prediction problems(top-left and bottom-right corners).The proposed method eliminates the design problems related to the anchors’ number,size,and aspect ratio by transforming the state estimation of the tracking object into corner prediction.This makes the proposed algorithm more flexible and general than traditional anchor-based methods.In terms of network design,the corner prediction network is followed by the multi-level convolution output of the deep network,meaning that the model can predict multiple candidate corners based on the deep and shallow features of the deep convolution network.A new penalty function is then introduced to select the optimal candidate corner as the tracking target box.Experimental results show that the proposed anchor-free method achieves state-of-the-art performance while maintaining a high running speed.(2)With the aim of addressing the assignment problem of the positive and negative samples in region proposal networks,the proposed method uses a local representation based on probabilistic anchor assignment that adaptively assigns positive and negative samples based on the current learning state of the model.To this end,the probabilistic anchor assignment first calculates the classification score for each anchor and then applies a probability distribution model to fit the classification score.In this way,the overall tracking model divides the positive and negative samples using a probabilistic approach during the training process.Furthermore,the probabilistic model introduces an online learning method that enables the tracking model to acquire a powerful representation ability by mining the background representation information.The online learning model also uses a hard negative mining strategy that is applied to mine as many hard negative examples as possible to add to the negative sample set,which gives better results than a set composed of simple negative samples.Experimental results show that the proposed probabilistic anchor assignment is better than the traditional fixed Io U threshold strategy.(3)With the aim of addressing the problem of template updating in the tracking model,this thesis proposes a local representation algorithm based on a template-guided attention network.The proposed algorithm can comprehensively utilize template and search feature information and provides an implicit template update method.A simple template update algorithm makes the model more robust to challenging factors such as occlusion and deformation.Furthermore,channel attention and spatial attention models are introduced to extract the key regions of the target in the channel and spatial dimensions,respectively.Deformable convolutional networks enhance the generalization ability of the model when the tracking target undergoes deformation or changes in aspect ratio and scale.Finally,the template updating model proposes a template-guidance attention network that can efficiently aggregate and interact between templates and search images to obtain rich feature information.Experimental results show that the proposed template-guidance attention network can improve the model’s performance.(4)With the aim of addressing the problem of misalignment of classification scores and regression predictions in traditional trackers,this thesis proposes an Io U-aware global representation algorithm with adaptive sample assignment.Firstly,the proposed Io Uaware tracker introduces a new Io U-focal loss to train the classification network,which reduces the weights of negative samples and assigns different loss weights based on the Io U values between the positive samples and the ground truth boxes.Secondly,the Io U-aware tracker introduces an adaptive sample assignment that divides the positive and negative samples based on the statistical properties of the training samples(mean and variance values).Thirdly,a star-shaped feature representation is developed to capture the geometric information of the target box and its surrounding context information,which is essential for solving the problem of misalignment between the predicted and ground truth boxes.Experimental results show that the proposed method achieves state-of-the-art results by using adaptive sample assignment strategies and addressing the problem of misalignment of the classification scores and regression predictions.(5)With a particular focus on the temporal information in videos and the boundary features of objects,this thesis proposes a global representation algorithm based on a border-aware network with a deformable Transformer.Firstly,the global representation algorithm introduces a border-aware network in the classification and regression branches of the tracking framework,which can effectively extract the boundary features of the target by using the Border Align operation to improve the accuracy of target localization.Secondly,the proposed method introduces a deformable Transformer that can enhance the template and search image features.This Transformer encoder calculates only a small number of key sampling points around the reference location on the feature map rather than calculating each pixel position of the entire feature map.The Transformer encoder enhances the multiple template features and generates high-quality encoder features.The Transformer decoder propagates previous template features to the current frame,which simplifies the searching process for tracking objects.Comparison results on large tracking datasets show that the proposed method achieves state-of-the-art tracking results.
Keywords/Search Tags:Object Tracking, Representation Learning, Local Representation, Global Representation
PDF Full Text Request
Related items