| Visual object tracking is a fundamental task in the computer vision field.Given the target information(position and size)of the initial frame,a tracker is required to estimate the state of the target in the subsequent frames without the prior information of the target.Recently,correlation filters and Siamese networks have become the mainstream paradigms in visual tracking and achieved promising performance.However,trackers still struggle to deal with the challenges of occlusion,background clutter and deformation.Visual object tracking relies primarily on image features,which serve as mappings of the underlying image content.Learning the spatial information of these image features is crucial for perceiving the target’s position and shape,and distinguishing background regions.This thesis focuses on investigating spatial-aware optimization strategies to enhance the representational power of features and improve the tracker’s robustness against challenging scenarios.This thesis is organized into the following five aspects:First,a spatial-aware correlation filter with an adaptive weight map for visual tracking is proposed to solve the tracking drifting problem.The proposed adaptive weight map consists of a color histogram based target likelihood and a priori spatial regularization.As the weight map encourages the foreground target pixels and suppresses the background pixels,it can help the correlation filter to effectively use target information and identify the foreground targets from distractors.Moreover,a high-confidence filter updating strategy is proposed to filter out unreliable historical frames and prevent lowquality historical frames with background information from reducing the purity of the correlation model.Second,in order to handle the occlusion issue during the visual tracking,this thesis proposes a contour-aware long-term tracking with reliable re-detection,consisting of a baseline tracker and a re-detection module.For the baseline tracker,a contour constraint map is incorporated into the correlation filter tracker,and can identify non-target regions and reduce the tracking drift.Moreover,the proposed re-detection module combines color and motion information to re-locate the target quickly and accurately when the target reappears.The experimental results demonstrate that the proposed method effectively handles challenging scenarios such as occlusion and deformation,thereby improving long-term tracking performance.Third,a localization-aware Siamese based tracking method is proposed to address the issue of inconsistent localization predictions.Siamese trackers treat the localization into two subtasks:classification for predicting the target position and regression for predicting the target shape.During online tracking,Siamese based trackers only consider the classification score of proposals and ignore the accuracy of regression prediction.Hence,a lightweight ranking network is proposed to generate the ranking scores for proposals.Higher scores are assigned to proposals whose regression accuracies are higher.The combination of classification and ranking scores serves as a new proposal selection criterion for online tracking and can boost the tracking performance significantly.Moreover,to enhance the precision of localization,a sample ranking attention mechanism is introduced in the classification subtask,which emphasizes positive samples with high overlap with the ground truth target box and distractors.This mechanism boosts the contribution of these important samples during classifier training,leading to the selection of more reliable positive samples as target representatives and further improving localization accuracy.Fourth,A target tracking method based on feature space ranking supervision is proposed to address the problem of confounding background and foreground features that hinder the tracker’s ability to distractors.To overcome this issue,we propose a classification ranking loss to model the relationship between positive samples and hard negative samples,which can effectively reduce the confidence maps of distractors and prevent the tracking from being fooled by the distractors.Moreover,this method proposes another ranking loss to align classification confidence scores with the corresponding regression accuracies for positive samples,enabling regression features to contribute to the learning of classification features.Finally,this thesis proposes a novel target tracking algorithm that leverages the local spatial attention mechanism to mitigate the problem of inaccurate estimation of target bounding boxes encountered in trackers supervised by global spatial attention mechanism.The proposed method incorporates two essential modules,namely criss-corss attention and correlation mask attention.The criss-corss attention module effectively captures discriminative features within a criss-cross shape region to focus on the target region.The correlation mask attention attends to reduce false matches between foreground and background pixels by seeking spatial continuity.Experimental experiments conducted on diverse datasets demonstrate the effectiveness of the proposed modules. |