| Visual Object tracking is one of the basic tasks of vision systems,and has important applications in the fields of intelligent video surveillance,human-computer interaction,and unmanned driving.After years of development,visual object tracking has made great progress.The tracking methods based on discriminant correlation filters network and Siamese fully-convolutional network,as a new trend in this field,have received more and more attention from researchers.However,most of the current visual object tracking algorithms only consider the appearance features of the current frame and do not make full use of the rich temporal context information between video frames.When there are challenges such as occlusion,deformation,and similar objects interference in complex scenes,the tracking performance degrades or even fails.In response to this problem,this paper explores corresponding solutions based on the discriminant correlation filters network and the Siamese fully-convolutional network respectively.The main research contents are as follows:(1)To solve the problem of insufficient discriminative power of template features in existing tracking algorithms based on correlation filters,this paper improves the discriminability of template features by comprehensively considering the features of multiple historical frames.Specifically,a feature alignment network is designed to align several historical frame features to the previous frame of the current frame,and the spatial-temporal attention mechanism is used to assign importance weights to the alignment results of different historical frame features,and then fusion is performed to obtain more robust template features,which can better cope with challenges such as occlusion and deformation.(2)In order to solve the problem that the response of the tracking algorithm based on the Siamese fully-convolutional network framework is prone to multi-peak when it is interfered by similar objects,which leads to tracking performance degradation,this paper designs a motion direction estimation network,introduces motion information,estimates the motion trend of the next frame,and uses it as constraint information,suppress the interference of the similar objects,and combine the target appearance information to determine the optimal state of the target.The above works make the correlation filters-like trackers and the Siamese fully-convolutional network trackers benefit from the rich temporal information between video frames,which improves the performance of the trackers.Experiments on object tracking evaluation datasets show that these improvements are effective. |