| Visual object tracking is a basic but challenging task in computer vision.In recent years,object tracking has received widespread attention due to its extensive applications,e.g.,in intelligent surveillance,human-computer interaction,autonomous driving,and other vision fields.In reality,object tracking is the process of finding the region of interest in a current frame based on subsequent video frames through feature matching.However,accurately predicting the location and scale of a target in a complex environment is fraught with difficulties because objects suffer from inevitable challenges during their motion.To address the above problems this paper proposes two Siamese network-based target tracking algorithms with the following main work.(1)A object tracking algorithm combining cascaded region candidate network feature fusion and coordinate attention is proposed.To enhance the feature representation during tracking,a cascaded region proposal network(RPN)fusion and coordinate attention mechanism are applied to the tracker in this section.The proposed network framework consists of three parts:a feature-extraction sub-network,coordinate attention block,and cascaded RPN block.We exploit the coordinate attention block,which can embed location information into channel attention,to establish long-term spatial location dependence while maintaining channel associations.Thus,the features of different layers are enhanced by the coordinate attention block.We then send these features separately into the cascaded RPN for classification and regression.According to the two classification and regression results,the final position of the target is obtained.To verify the effectiveness of the proposed method,we conducted comprehensive experiments on the OTB100,VOT2016,UAV123,and GOT-10k datasets.Compared with other state-of-the-art trackers,the proposed tracker achieved good performance and can run at real-time speed.(2)Sample-balanced and IOU-guided anchor-free visual tracking is proposed.In this part,we first introduce balance factors and modulation coefficients into the Cross Entropy loss function to solve the classification inaccuracy caused by the imbalance between positive and negative samples as well as the imbalance between difficult and easy samples during the training process,so that the model focuses more on the sparse number of positive samples as well as the difficult samples that make the main contribution to the training.Second,the Intersection over Union loss function continues to be improved and applied to tracking.The improved intersection ratio loss function takes into account not only the intersection ratio of the areas of the predicted and real frames,but also the difference in aspect ratio between the two shapes,and the minimum area that can contain both.This is very helpful to generate more accurate regression offsets.The overall loss of classification and regression is then combined and feedback iterations are performed to optimize the loss of the model so that the loss of the model is minimized to improve the accuracy and robustness of target tracking.Experiments on the OTB2015,VOT2016,UAV123,and GOT-10k datasets demonstrate the advanced nature of the algorithms in this part. |