| Visual object tracking has always been a research focus of scholars at home and abroad.With years of research and development,tracking algorithms have gradually encountered bottlenecks.However,the rise of deep learning technology has injected new vitality into tracking technology,enabling long-term development of tracking technology and promoting the field to move towards a new stage.At present,the performance of target tracking algorithms is still affected by various factors such as lighting changes,scale changes,occlusion,etc.in natural scenes,so it is still very difficult to track them.This paper proposes a Siamese network visual object tracking method with channel and spatial attention mechanisms,based on Siam RPN++ as the basic algorithm,to address the problem of target occlusion causing tracking failure or target interference causing a decrease in tracking accuracy when tracking in short-term and long-term motion scenarios.This method mainly focuses on the Siamese network framework for deep feature extraction,The region recommendation network for adaptive generation of anchor boxes is supplemented by the introduction of multi-layer feature fusion and adaptive template update strategies to accurately predict target positions,thus designing a more robust target feature extraction method and decision model to improve the success rate and accuracy of the algorithm.The main research content and innovation points of this paper are as follows:(1)In response to the problem in traditional target tracking algorithms where the main network lacks discrimination of feature information in channels and spaces,we propose a Siamese network target tracking algorithm that combines dual attention mechanisms and feature fusion.Firstly,we combine channel and spatial attention mechanisms with the main network to enhance effective information about targets in channels and spaces while suppressing ineffective information,thus improving the model’s ability to discriminate targets.Secondly,we perform multi-level fusion on features from each layer so that even shallow networks can obtain deep semantic information,thereby improving robustness of the algorithm under complex conditions.Finally,experiments are conducted on two datasets: OTB100 and La SOT which demonstrate that this algorithm can extract more accurate target feature information resulting in an overall improvement of performance for this algorithm.(2)In response to the problems of candidate boxes not being able to frame special targets(especially those that are particularly tall or wide),producing many useless samples,having an imbalance between positive and negative samples,and requiring a large amount of computation in anchor-based target tracking algorithms based on manually designed anchor boxes,we propose a Siamese network target tracking algorithm based on adaptive generation of anchor boxes.Firstly,we use an adaptive region proposal network to solve the problem of generating anchor boxes.Secondly,we use pixel-wise cross correlation operations to address feature space blurring during template and search feature fusion.This operation can make fused features clearer,thereby improving boundary box positioning accuracy and enhancing the network’s modeling ability under complex conditions.Finally,experiments are conducted on two datasets:OTB100 and La SOT which demonstrate that after improving the method for generating anchor boxes and addressing feature space blurring issues,this algorithm can obtain more accurate tracking frames effectively improving success rate and accuracy for this algorithm.(3)In response to the problem of tracking drift or failure due to target occlusion or missing feature information during long-term tracking,we propose a Siamese network long-term target tracking algorithm based on anti-interference templates.This algorithm improves upon the target tracking algorithm in Chapter 4 by introducing an adaptive template update mechanism that enhances the algorithm’s ability to track targets when they are occluded by updating template information for the current frame.Secondly,we design a template update loss function to train the adaptive template update mechanism.Finally,experiments are conducted on La SOT and UAV123 datasets including UAV20 L data set which demonstrate that this algorithm is more advantageous than algorithms without a template update mechanism and can better solve problems with target occlusion in complex scenes during long-term tracking while having stronger robustness. |