| As one of the basic tasks of computer vision,visual single-object tracking has a wide range of applications in intelligent video surveillance,unmanned driving and other fields.In recent years,SiamFC(Siamese Fully Convolutional),a object tracking algorithm based on fully convolutional siamese networks,has gained great attention due to its high level of tracking accuracy and operation speed.In this paper,we analyze visual single-object tracking based on this algorithm,which is expressed in the form of predicting the actual position and scale size of the object to be tracked in subsequent frames of the video,after given the information of an arbitrary single object to be tracked in the initial frame.The algorithm proposed in this paper can guarantee both tracking accuracy and real-time performance in complex tracking scenarios.Compared with current tracking algorithms,incorporating various complex mechanisms,the proposed algorithm has a smaller number of parameters and can be applied to low computing power devices such as embedded devices,which has a wide range of application prospects.The main research elements of this paper are outlined as follows:(1)To address the problem of poor tracking accuracy and robustness of SiamFC algorithm when facing complex tracking scenarios such as object occlusion,deformation and rotation,we propose a real-time object tracking algorithm based on a two-branch siamese network named Siam SAF(Siamese Semantic,Appearance and Fusion).Firstly,we use a two-branch siamese network,containing semantic branch and apperance branch,and fusion module to build framework of algorithmic network.Secondly,the apperance branch uses a shallow neural network to extract the appearance features rich in structural information and spatial location,and the semantic branch uses a deep neural network to extract the semantic features rich in high-level semantic information and strong robustness.Finally,the response maps of the apperance Branch and semantic branch are fused by an adaptive fusion module.Experiments are conducted on the OTB2015 and VOT2018 test sets show that compared with the baseline algorithm,the precision and overlap success rate of the proposed algorithm are improved by 10% and 7% while meeting the real-time requirements.(2)A high frame rate object tracking algorithm based on the CB-Fire(Crop,Batch Normalization and Fire Module)module,named Siam Squeeze+,is proposed to address the problem of insufficient discriminative features extracted by the SiamFC algorithm which uses shallow neural network as a feature extractor.To start with,Squeeze Net,a lightweight neural network built by the Fire module,is used as the backbone network.After that,crop and batch normalization layers are introduced to the Fire module to solve the problem that the filling operation of the convolutional layer damages the tracking accuracy and the deep network is difficult to train,respectively.In the end,the feature extraction network of the algorithm is built by the CB-Fire.The deep feature extraction network used in Siam Squeeze+ can effectively reduce the number of parameters in the network and reduce the feature extraction time,while enhancing the discriminative nature of the extracted features.Extensive experiments conducted on the OTB2015 and VOT2018 test sets show that compared with the baseline algorithm,the precision and overlap success rate of the proposed algorithm are improved by 8.1% and 6.2%.The amount of network parameters only accounts for 28.3% of the parameters of the baseline algorithm,the proposed algorithm has good application prospects and the algorithm far exceeds the real-time requirements.(3)To solve the problem that the contribution of each channel of the features to the similarity calculation is average in the mutual correlation operation of SiamFC algorithm,and the tracker does not focus on the features that are more helpful for tracking,a lightweight siamese network object tracking algorithm that incorporates the attention mechanism is proposed.Firstly,we use a modified Mobile Net V1 as the feature extraction network,which uses a deep separable convolution to effectively reduce the number of parameters compared with the traditional convolution operation.Then,we add a lightweight channel attention mechanism at the end of the template branch to increase the weight of important channels.Finally,experiments on the OTB2015 and VOT2018 test sets show that compared with the baseline algorithm,the precision and overlap success rate of the proposed algorithm are improved by 4.2% and 2.8%.The proposed algorithm has a smaller number of network parameters and significantly improves tracking accuracy and speed compared to the baseline algorithm. |