| Visual object tracking is a fundamental yet challenging research topic in computer vision,which has more and more extensive application scenarios,including video surveillance,human-computer interaction,automatic driving and military guidance,etc.The main task of visual object tracking is to accurately infer the position of the target in the subsequent frames thought a tracker given the position of the target in the first frame of the video.In recent years,more and more researchers have devoted themselves to the research of visual target tracking based on deep neural network,and the tracking performance has been continuously optimized.But because of similar background interference,deformation,illumination,scale changes,motion blur and occlusion,it still faces huge challenges to accurately complete the object tracking task.This paper focuses on the research of the object tracking based on siamese network,the main contributions are summarized as follows:(1)An interactive guided visual object tracking method incorporating global context information is proposed.Aiming at the problem that different channels of deep feature map have different sensitivity degrees to different targets,squeeze and excitation module is introduced in the backbone network to recalibrate the response of features on different channels and improve the sensitivity degree of features to various targets.The target position is determined by the similarity map produced via cross-correlation over features generated from template branch and search branch.The interaction between the template and search branches is essential for achieving high-performance object tracking task.In this work,we build information interaction module,which is capable of capturing attention weights in the global response characteristics of each channel in template branch,so as to normalize the channel characteristics in search branch and make up for the inadequacy of conventional convolution receptive field.The attention weights are used to guide the response of search branch features in the corresponding channel to achieve the effect of interaction between the two branch features.The proposed algorithm improves the tracking robustness in the case of fuzzy background and fast moving target.(2)An accurate localization visual object tracking method based on adaptive feature adaptation is proposed.In order to solve the problem lacking prior information when the visual object tracking methods based on anchor-free structure regresses arbitrary shape targets.In this work,the dilated convolution modules with different dilation rate are introduced in the output layer of the backbone network to obtain different ratios of receptive field information and enhance the ability of obtaining prior information.Aiming at the problem that the regression results of different shape targets are not accurate enough in single scale features.Features which have different receptive field information are used to perform the regression of the target box respectively,and the trainable weight parameters are adaptive to integrate multiple regression results to achieve the purpose of feature adaptive adjustment,and finally achieve the effect of accurate target location.The proposed algorithm improves the regression accuracy of arbitrary shape objects.(3)A visual object tracking method based on local semantic information interaction is proposed.For the two branches of the Siamese network,the cross-correlation process will be interfered by the background information of the edge part of the template,which result in the blurred response.When calculating the response,the feature in the center area of the template is first intercepted as a new template feature.When calculating the similarity by the cross-correlation operation using convolution,the entire template needs to be used as a convolution kernel to perform an operation similar as sliding window in the search area.This process will lead to local redundancy during calculating similarity.In this work,the information of each point in the new template feature is used as a convolution kernel,and the cross-correlation operation is performed with the features of the search area,which ensures that the response of each dimension in the output encodes the local information of the corresponding position of the target.Aiming at the problem lacking information interaction among different dimension,point-wise convolution is introduced to realize the information fusion of all channels.The proposed algorithm enhances the saliency of the target region,weakens the blurring effect of the background area,and improves the tracking performance significantly. |