| Point cloud registration is important in 3D computer vision and has involved applications in many fields,such as 3D reconstruction,autonomous driving and augmented reality.It can be divided into two categories: the descriptor-based method and the correspondence-based method.The former can effectively extract robust local features utilizing the geometric structure information of points;the latter predicts exact correspondences according to the geometric correspondence relationship between points.Learning highly accurate and robust features is the key to them.The existing some works have achieved success on registration recall,however,they rely solely on the geometric structure information of points,the performance of features is always limited by the ambiguity caused by the repetitive and plane geometric structures in point cloud.It leads to the problems:(1)the poor distinctiveness of descriptors;(2)the poorly removing outlier ability of correspondence features.With the development of point cloud technology,equipments of point cloud not only capture point cloud,but also obtain the image with same scene semantics.Utilizing image texture information to improve the distinctiveness of point cloud features(descriptors and correspondence features)is a solution for the above problems.Therefore,in this thesis,a multimodal feature fusion method for point cloud registration is investigated and researched.First,for problem(1),this thesis designs a interpretable multimodal feature fusion method for point cloud registration.It uses a cross-attention mechanism based on Transformer to fuse the point cloud structure information and image texture information with same scene semantics,which significantly improves the distinctiveness of descriptors by utilizing texture information guidance.Meanwhile,a descriptor activation mapping method is proposed to explain the descriptor extraction process,which visualizes the distribution of points provided effective contributions for descriptor extraction.In addition,it utilizes the contrastive learning loss function of adversarially positive and negative sample pairs to constrain the relationships between points.Compared with the state-of-the-art descriptor-based methods,the proposed method improves the feature matching recall by 0.2%,the inlier ratio by 6.3% and the registration recall by 3.7% on large real datasets.Second,for problem(2),combining the prior of problem(1),this thesis designs a general multimodal correspondence feature fusion method for point cloud registration.It utilizes multistage fusion to fuse paired image texture information and correspondence information,which effectively improves the removing outlier ability of correspondence features by combining texture information constraining.Meanwhile,this thesis proposes a convolutional positional encoding to solve the lack of local information in the global attention mechanism,which further improves the distinctiveness of features.In addition,the method can be applied on any correspondence feature extraction neural network without any change in training strategies and loss functions.The designed method improves the inlier ratio by 9.8% and registration recall by 8.44% on large real datasets by applied on the state-of-the-art models of the correspondencebased method.To verify the effectiveness and robustness of the proposed method,the thesis performed comparative experiments with state-of-the-art methods on multiple large real datasets.Meanwhile,the ablation studies were performed to verify the effectiveness of components of the proposed methods. |