| Visual object tracking is one of the most challenging fundamental problems in computer vision.The object tracking task aims to accurately estimate the location of the object in the scene at each frame in subsequent videos,given the initial state of the object.In recent years,due to the prominent practical significance of long-term tracking,the research focus of visual object tracking has shifted from short-term tracking to long-term tracking.Long-term tracking has two important characteristics.One is that,unlike short-term tracking sequences that last only hundreds of frames,long-term tracking video sequences usually reach thousands of frames,and the average duration is at least at the minute level.The second is that long-term tracking scenarios widely contain difficult and challenging attributes such as targets outside the field of view,complete occlusions,and interference from similar objects.This puts high demands on the robustness of the tracking algorithm.At the same time,since long-term tracking is practical,real-time tracking is also an important potential requirement.Therefore,designing a target-tracking algorithm that can maintain robustness,accuracy,and efficient operation in complex scenes is a major problem that needs to be solved urgently.In order to improve the performance of the target algorithm in the long-term tracking scenario,this paper optimizes and improves the long-term tracking framework based on the local-to-global switching strategy,and studies the verification evaluation module,re-detection module,and model update module respectively.From the algorithm,The robustness,speed,and accuracy of the three aspects of the improvement method are proposed to seek a more suitable performance balance.The proposed algorithm can effectively improve the tracking performance in complex scenes of long-term tracking.The main research content of this paper is as follows:(1)Aiming at the robustness and operating efficiency of the algorithm,a long-term tracking algorithm LTPT(Long-term Tracking via Probabilistic regression and Target center regression)based on probabilistic verification and two-stage re-detection is proposed.Based on probabilistic regression,the uncertainty in Siamese network output space is explained by conditional probability.Based on the target center regression,by highlighting the spatial center position of the target,it roughly returns to the area to be detected and then cooperates with the local tracking module to improve the detection speed.First,use the attention mechanism to globally link the features of the input paired images to highlight the edge of the target,and then combine the depth correlation to highlight the target position to improve the appearance characteristics of the target in space.Second,the output of local tracking is probabilistically interpreted to reduce verification complexity and improve verification confidence.Finally,a lightweight detection network is designed,focusing on the spatial robustness of object detection,using object center regression to speed up the object re-detection process,which improves the speed at which the tracking algorithm recaptures the object.(2)Aiming at the robustness and tracking accuracy of the algorithm,a fast online longterm tracking algorithm LTCO(Long-term Tracking with Contrast Optimizer)based on contrastive learning optimization is proposed.Based on metric learning and contrastive learning,the tracker’s target perception ability and distractor perception ability are interpreted as the distance between samples encoded in the feature space.First,the training samples of the contrastive optimizer are collected through the output of the local tracker,and a label definition function is designed to adaptively divide the samples into triplets.Secondly,by comparing the first training stage of the optimizer,the distance between positive samples and hard negative samples is shortened in the feature space,while the distance between them and negative samples is widened,and the positive samples are learned through the loss function of metric learning.The similarity between samples and hard negative samples.Then,the target features are enhanced,and through the second training stage of the contrastive optimizer,the contrastive loss is used to force the optimizer to pay attention to the difference between positive samples and hard negative samples,so that the optimizer can learn the ability of target perception and distractor perception.Finally,the contrastive optimizer outputs binary results in the tracking inference stage to assist the online tracker in making update decisions. |