| Visual object tracking is one of the important research areas of computer vision.The algorithm models the motion information of the tracked target by examining the context information of the target in continuous video frames,and then predicts the target motion state based on the established model and locates it.In recent years,with the rapid development of computer technology,image and video processing technology and artificial intelligence technology,visual object tracking is widely used in intelligent video surveillance,intelligent transportation systems,intelligent visual navigation and other fields.It has broad application prospects and has become an indispensable and important part of life.The performance of tracking based on correlation filtering in terms of speed and accuracy depends on the characterization of target features,and deep neural networks have powerful feature representation capabilities,so the two are merged with each other,and the position and size based on deep features and correlation filters are constructed.The multi-modal tracking model of angle should be one of the development trends in the field of visual tracking.This paper focuses on the visual tracking algorithm based on correlation filter,the learning method of deep feature extraction neural network,and the multi-modal tracking model based on deep feature and correlation filtering.The main work is as follows:(1)Based on the classic correlation filtering tracking algorithm,on the one hand,the trade-off strategy between tracking accuracy and speed is studied,and an algorithm to improve the tracking speed under the premise of ensuring the tracking accuracy is proposed;Research on changing tracking targets is carried out,a rotating target tracking algorithm is proposed,and a real-time tracking algorithm based on manual feature extraction that can simultaneously track the position,size,and angle of the target is designed.The paper carried out relevant experimental research and result analysis.(2)After analyzing the limitations of the unsupervised deep tracking network(UDT)based on correlation filtering,a training method based on graph cut,global contrast saliency detection and GrabCut improved UDT feature extraction network is proposed,which can be used from video frames automatically select the saliency target area for unsupervised feature network training;propose a method to reduce the complexity of the graph cut,global contrast saliency detection and GrabCut algorithm,so that it can be one frame of unlabeled video without manual intervention automatically annotate multiple salient targets in the image to provide rich label data for the further training and improvement of the unsupervised feature extraction network.Relevant experimental research and result analysis of the above methods are carried out.(3)The above two parts(1)and(2)are highly integrated,and the multi-modal tracking algorithm based on deep network and correlation filters is studied,and the multi-modal based on deep network and correlation filters is designed and implemented(position,scale,angle)tracking integration algorithm;the training dataset was constructed using the automatic labeling method proposed in(2),and the feature extraction network was trained on this dataset;the tracking integration algorithm was carried out on the OTB2015 dataset.Related experimental research and comparative analysis of results are conducted. |