Font Size: a A A

Research Of Real-Time Object Tracking In Adaptive Template Update Based On Deep Features

Posted on:2022-03-29Degree:MasterType:Thesis
Country:ChinaCandidate:T Y ZhangFull Text:PDF
GTID:2558306914978879Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
In the past few decades,target tracking has always been a very active research direction in the field of computer vision,and it has been widely used in different fields such as video surveillance,unmanned driving,human-computer interaction,and augmented reality.However,the current target tracking technology still faces many challenging problems.Among them,the existing template update strategy is still immature.When the target is partially occluded,deformed,or encounters background clutter in subsequent frames,the tracker is prone to tracking drift;the number of feature extraction network layers is relatively shallow,and most The realtime tracking network still uses AlexNet as the backbone network,and has not effectively used the powerful feature representation capabilities of the deep convolutional network;the L2 loss function used easily makes the model easy to fall into the local optimal solution during training,making the tracker very It is difficult to accurately locate the target.Aiming at the above three issues,this article innovatively improves the current mainstream target tracking network.The main innovations and contributions include the following three parts:(1)An adaptive template update network is proposed.The network uses the initial template of the target in the first frame,the accumulated template and the predicted template at different moments as the input of the inter-frame residual calculation module,through which the mutual residuals are calculated to obtain the actual required current frame The update range is used to adaptively update the template required by the current frame.Experiments show that when the jump structure of the interframe residual calculation module is connected to the initial template of the first frame,the performance of the model is the best.In addition,in order to avoid the cumbersome and inefficient training process,an N-step iterative training method is introduced in the training phase of the adaptive template update network.Among them,at the beginning of iterative training,the tracker is made to use a standard linear update strategy to provide necessary data for the next training of the adaptive template update network,including accumulated templates and predicted templates at different times.By analogy,the data obtained in the Nth step and the initial template are formed into three sets of data pairs,which are used as the training data of the adaptive template update model in the N+1th step.Experiments show that when the value of N is 3,the performance of the model is the best.The proposed adaptive template update network is applied to experiments in SiamFC,SiamRPN and SiamDW,and its performance on the VOT2016 and VOT2017 data sets exceeds the original performance of the tracker,which also proves that the proposed adaptive template update network effectively predict the change of the target in each frame.(2)A deep feature extraction network is proposed.Through a large amount of literature reading and in-depth thinking,it summarizes several reasons why the deep network structure(feature extraction part)affects the accuracy of the target tracking task,and gives the corresponding design principles.According to the obtained design principles,DenseNet is improved accordingly,and a feature extraction network suitable for target tracking tasks is obtained.Among them,a crop operation is added after each DenseBlock in DenseNet to cut out the zero padding caused by the convolution operation in a second.Then after crop,a maximum pooling operation with a step size of 2 is placed to ensure that the output of each DenseBlock has the same size.We replace the AlexNet originally used in SiamFC and SiamRPN with the proposed deep feature extraction network and conduct a large number of experiments on the VOT dataset.The experimental results are compared with the trackers using other feature extraction networks(AlexNet,ResNet).In terms of accuracy and speed,the improved tracker is superior to other trackers,which further verifies the effectiveness of the deep feature extraction network designed in this paper for target tracking tasks.(3)Use Smooth L1 loss function for network training.The existing several non-linear-based template update mechanisms generally use the L2 loss function instead of the L1 loss function in the training phase,because the convergence speed of the L2 loss function is much faster than the L1 loss function.However,the operation of squaring the error by the L2 loss function will not only increase the gap between the maximum error and the minimum error,but also make the function itself very sensitive to abnormal points.Therefore,this article innovatively uses the Smooth L1 loss function,which can combine the advantages of the two,to train the proposed adaptive template update network,and conducts ablation experiments on the public VOT test set.The experimental results show that Soomth L1 updates the adaptive template.Effectiveness of network performance.
Keywords/Search Tags:Computer vision, Object tracking, Template update, Deep features, Loss function
PDF Full Text Request
Related items