| With the rapid development of sensor technology,the ability of unmanned vehicles to per-ceive environments has made great progress.In recent years,unmanned vehicles have begun to enter public places and have been widely used in the fields of military,aerospace,and civil-ian transportation.One of the important perception capabilities of the vehicles is localization,which supports the functions of the vehicles such as planning,navigation,and control.Among various types of sensors used for localization,Li DAR can obtain accurate 3D environmental structure information by emitting lasers actively and it is insensitive to illumination changes.Benefiting from these characteristics,global localization using point clouds of Li DAR can work full-time and achieve high accuracy.Therefore,global localization based on point clouds is of great research significance for unmanned vehicles to achieve long-term and full autonomy.There are several problems with current global localization methods based on point clouds.First,due to the sparsity of Li DAR scans,local features directly extracted from the point cloud are often of low discriminability.An appropriate intermediate representation of point clouds for feature encoding needs to be developed.Second,current point cloud methods usually fo-cus on solving the sub-problem of place recognition for global localization and cannot provide global pose estimation.How to design a complete global localization framework remains an open problem.Third,unmanned vehicles may have arbitrary orientations on roads.For robust localization,one needs to design rotation-invariant features for point clouds.Fourth,image-to-point cloud cross-modal localization has high application value,but there is relatively little research in this area currently.In view of the above four problems,this thesis carries out the following innovative researches:1.Taking the Bird’s Eye View(BEV)image as an intermediate representation of the point cloud,this thesis proposes a two-stage global localization framework BVMatch.BVMatch transforms the global localization problem of point clouds into a global localization problem of images.It uses a Bird’s eye View Feature Transform(BVFT)specially designed for BEV images to realize global localization through two stages of place recognition and pose estima-tion.Experiments show that BVMatch is superior to other methods in place recognition and meanwhile is rotation-insensitive.In addition,BVMatch can achieve high matching success rates and estimation accuracy in pose estimation.2.To further improve the efficiency of the global localization framework,a direct global localization method based on the Histogram of Orientations of Principal Normals(HOPN)was proposed.HOPN descriptors are constructed from principal normals calculated from the bird’s eye view and designed rotation invariant and highly discriminative.The proposed direct local-ization method uses HOPN to match local point clouds with global maps and achieves robust pose estimation under the matches of a low inlier ratio based on a consensus set maximization algorithm.Experiments show that the method can localize single Li DAR scans on a large-scale map over 4 km~2with the robustness to view changes.Its performance can further be boosted by accumulating local point clouds into a local map.3.To further improve the success rate and generalization ability of global localization,a bird’s-eye view place recognition network called BEVPlace is proposed.BEVPlace uses an equivariant network to extract local features and Net VLAD for global feature aggregation.Based on the robust representation of BEV images and the powerful representation capabilities of convolutional networks,BEVPlace achieves state-of-the-art performance on multiple public datasets and demonstrates strong robustness to rotations and generalization ability.Regarding global localization,a position estimation method is also proposed,which recovers the distance between point clouds by mapping the feature space to the geometric space.4.Aiming at the cross-modal localization problem between images and point clouds,an image-to-point cloud cross-modal place recognition method(I2P-Rec)is proposed.I2P-Rec starts from the perspective of modality conversion,using a depth estimation network to re-cover point clouds from images.Then it projects the point clouds into BEV images and uses a convolutional network to extract global features.Experimental results show that the proposed I2P-Rec can achieve a recall rate over 90%at Top-1%when localizing images in large-scale point cloud maps with a small amount of training data,verifying the feasibility of cross-modal place recognition tasks. |