| Visual target tracking is a fundamental research topic in computer vision and pattern recognition.Its core task is the process of how to estimate the target trajectory and state continuously and stably in the subsequent video sequences,which given the initial state of unknown objects(e.g.,central location and extend),providing a necessary basis for upper-level semantic tasks such as behavior recognition and scene understanding.Object tracking is not only applied to civilian fields,such as intelligent video surveillance and human-computer interaction,but also to military fields,such as aerial surveillance and the army strike.In the past 20 years,with the development of computer technology and the perfection of mathematical theory,scholars have proposed plenty of tracking algorithms and achieved remarkable progress in recent years.However,in the actual tracking,the object still faces the model drift problem caused by non-rigid deformation,partial/full occlusion and out-of-plane/in-plane rotation,and other challenging factors.The tricky is that multiple interferences will appear at the same time.Therefore,how to design an accurate and robust object-tracking algorithm under unconstrained conditions is still an urgent problem to be solved.Among tracking frameworks,Discrimination Correlation Filters(DCFs)have attracted much attentions due to their trade-off between accuracy and efficiency.There are two significant reasons for the success of this tracking paradigm.On the one hand,the training and detecting sample is generated by a cyclic shift of the basic patch,which realizes approximate dense sampling and improves the filter’s discriminability.On the other hand,the circular convolution theorem converts the complex convolution operation in the spatial domain to the product operation in the frequency domain,which avoids the time-consuming convolution operation,and improves computational efficiency.In this dissertation,based on the DCFs framework,we conduct research in several aspects,such as constructing a robust target appearance model,introducing adaptive model updating strategy,evaluating response’s reliability,fusing different features and adding redetect mechanism to achieve stably and accurately tracking the target in complex scenes.The main achievements of this dissertation can be summarized as follows:1)We proposed an anti-drift correlation filter based on sparse response and adaptive spatial-temporal context-aware.To handle with the problem that the model drift and even tracking failure which caused by boundary effects and filter degradation,we proposed an anti-drift correlation filter based on sparse response and adaptive spatial-temporal context-aware.Filter degradation often caused by target object appearance variation,such as occlusion,deformation and out-of-plane rotation.Specifically,the target surrounding is directly integrated into the DCFs framework to mitigate the boundary effect.In addition,the local-global response and appearance variations between different frames are fully utilized to inject adaptive temporal regularization into the filter training stage to prevent model degradation.Meanwhile,response’s sparsity is considered,which further reduces the risk of model drift.Finally,we constructed short-term APCE-Pool and Peak-Pool based on the feedback from the historical responses,which guides filter updating with high-confidence and reveals the tracking state.When the model is persistently unreliable and abnormality occurs,Kalman filter will be utilized to track target.2)We proposed a coarse-grained to fine-grained background-aware sparse filter model for visual tracking.Due to the redundancy and irrelevance of different features in the appearance representation,the tracking performance will be seriously reduced when the target experiences interference,such as fast motion and occlusion.Therefore,this paper proposes a coarse-grained to fine-grained sparse filter model by fully analyzing the correlation and complementarity between different features.When constructing the target model,the feature with a positive reference value is selected by LASSO regression,which restrains the potential interference of the original image space.In addition,l2 constraints are imposed on the target and its surrounding in the filter training stage to adapt to the scene where the target’s appearance changes frequently.Finally,in the target localization,a high-confidence localization strategy from coarse to fine is proposed.Specifically,the deep CNNs feature is used to locate the target roughly,and the Kalman filter is used to monitor the location results.Based on coarse localization,accurate localization is realized by adaptive fusion of artificial and shallow CNNs characteristic responses.At the same time,multimodal detection is carried out on the fine-location response graph.When the response is distorted,the Kalman filter estimates the target state.3)We proposed a target tracking algorithm based on background constraint and aberrance suppression.In the actual tracking process,the tracking performance is reduced when the target is faced with external interference,such as occlusion and background clutter,scale change,and internal interference of in-plane/out-plane rotation.In the filter training stage,the global context patches are considered to enhance the discrimination ability of the model under external interference.In addition,by constraining the change rate of the response between two adjacent frames,the potential aberrance caused by internal interference,such as in-plane/out-of-plane rotation and deformation,can be effectively suppressed,thus obtaining a more robust appearance model.Since this,a CFs tracker with high confidence,which uses HOG features to encode the appearance of the target,is learned to re-capture the target when the target’s field of view is removed.Finally,in view of model degradation,APCE of the historical response graph is used to identify the reliability of current tracking results,to realize the coordination between CF and repeated detector,and achieve stable and accurate tracking of targets.4)We proposed low-rank and context-aware correlation filter model for visual tracking.Considering the previous sections,when building the filtering model,the background information around the target is directly integrated into the filtering framework,which is sensitive to the dynamic changes of the background around the mark.It thus leads to the problem of tracking performance degradation.Based on the background perception filtering model,we apply low-rank smoothing constraints across video frames to make the learned model low-dimensional to obtain a more compact and robust background perception model.In addition,because of the possible distortion of the response graph,multi-mode detection is carried out on the response graph,and the filter is updated according to the detection results.At the same time,when the response is unreliable,the sampling around the target is carried out to recapture the target,further repressing the model drift and improving the tracking performance. |