| Video object tracking(VOT)is a fundamental task in computing vision with a broad extent of applications,such as autonomous piloting and video surveillance.Long-term tracking protocol requires trackers to deal with extra challenges like target appearance variations and frequent occlusions.A major problem in long-term tracking is how to update the target model as well as avoid model degradation.Existing online-adaptive strategies can hardly introduce to all tracking subtasks(classification,regression and segmentation)simultaneously,and low-quality samples may quickly degrade the target model.To address these issues,this paper fist reviews the development of long-term tracking and describes recent state-of-the-art online trackers,then introduces proposed methods as follows:(1)To solve the problem that traditional correlation-based trackers fare poorly on target deformation and appearance variation,this paper proposes an online-adaptive classification and regression network: the entire network is constructed in a fully convolutional manner,then online update the weight of two key kernels with collected samples during the tracking inference phase.Moreover,a stacked GRUs-based sample filter is introduced to supervise the sample collecting process and improve the robustness of online updating.(2)Existing matching-based segmentation methods used in trackers are still restricted to the target model created in the first frame,which leads to the lack of long-term adaptability.To address this issue,this paper proposes a novel long-term segmentation tracker leveraging memory attention network,which estimates the relevance between current frame and memory frames with partial cost volume to form an adaptive segmentation template.The stacked GRUs-based sample filter is also introduced here to perform sample collecting supervision.As for engineering application,this work deploys algorithms mentioned-above on a Jetson Nano device to simulate interactive Unmanned Aerial Vehicle photographic tracking based on UAV123 dataset. |