| Single object tracking is a key issue in computer vision.In recent years,camera-based 2D object tracking algorithm has made rapid development.At the same time,LIDAR-based 3D object tracking algorithm in point clouds is also getting more and more attention in the fields of autonomous driving,robotics,and augmented reality.Most of the existing point cloud trackers adopt the Siamese network framework and treat the single object tracking task as a similarity matching problem.Firstly,point cloud features are extracted through the feature extraction network,then the feature fusion network is designed to fuse template and search area features using similarity metrics.Finally,the region proposal network based on deep Hough voting is used for classification and regression.However,in most scenes,point clouds are sparse and irregular,which brings great challenges to the 3D point cloud object tracker.Considering the characteristics of 3D point cloud object tracking task and the problems of existing trackers,the main work of this article are as follows:(1)The region proposal network based on deep Hough voting suffers from heavy outlier votes in sparse and irregular point cloud scenes.At the same time,the simple Hough voting does not consider the global semantic information of the target,which seriously affects the accuracy of the tracker.To solve these problems,we propose an accurate 3D single object tracker with local-to-global feature refinement.In the first stage,deep Hough voting is used to obtain coarse proposals.In the second stage,we design local feature refining and global feature refining modules to achieve precise location collaboratively.The local feature refinement module eliminates noisy outliers in the unordered point cloud and obtains refined local features for coarse proposals.Then,the global feature refinement module explores the relationship between all proposals to get the global context information at the proposal level.(2)Most previous trackers use PointNet++ to extract point cloud features in challenging scenes.It is not specially designed for the single object tracking task and lacks the ability to effectively weigh different points based on their contributions to the tracking process.To solve this problem,we propose an accurate 3D point cloud tracker based on the transformer network.A transformer feature extraction network generates attention features by calculating the attention weight and guides the tracker to focus on the foreground points.At the same time,we project the sparse point cloud to the bird’s eye view to obtain dense features for classification and regression.(3)The tracker based on the Siamese network usually has a two-branch backbone to extract features.And it inevitably uses some cross-correlation operations with a large amount of computation in relation modeling,which seriously affects the key speed metric of the tracker.To solve this problem,we design a one-stream 3D point cloud tracker.In the process of feature extraction,the template information is embedded into the search area,which avoids time-consuming cross correlation operations.In this way,we greatly improve the speed of the tracker and achieve superior performance. |