Font Size: a A A

Research On Indoor Visual SLAM Mapping Based On Multi-scale Feature Fusion And Semantic Segmentatio

Posted on:2024-06-26Degree:MasterType:Thesis
Country:ChinaCandidate:W H YuanFull Text:PDF
GTID:2568307130472204Subject:Electronic Science and Technology
Abstract/Summary:PDF Full Text Request
Visual simultaneous localization and mapping(SLAM)is a process in which mobile robots employ optical cameras to collect surrounding environment information to construct a consistent map and use the map for real-time localization.It is one of the most important technologies for addressing the issues of localization,map construction and navigation when mobile robots are in an unknown environment.Although the development of visual SLAM technology tends to be mature,it still performs poorly within special environments with insufficient light and missing textures,and there are frequently instances of insufficient feature extraction and feature tracking loss.In addition,traditional visual SLAM suffers from the deficiencies in constructing maps that are not strong in representing the actual environment and do not carry environmental semantic information,making it difficult to be used as a research basis for upper-level intelligent tasks such as navigation,path planning,and robot interaction.To this end,this thesis simultaneously extracts both point and line features as matching and tracking objects within the visual odometry of visual SLAM,realizes multi-class features working in concert,and improves the robustness of the visual SLAM system in special environments.Then deep learning is combined with visual SLAM to construct 3D semantic octree maps containing only static objects to enhance the representational capability of the maps.The main research contents completed are as follows:(1)The camera pose estimation algorithm based on point-line feature matching is investigated.A fast processing scheme for point-line features is proposed.Firstly,the point and line features in the image are extracted,the point features are matched and tracked using the pyramid LK optical flow method,the line features are matched using geometric constraint in non-key frames,and the LBD descriptors are calculated to match the line features in key frames.Then the mismatch in the matching result is removed,and the reprojection error of point-line features is calculated to achieve camera pose estimation.Finally,the effectiveness of the proposed method for localization and tracking in special environments is verified by extensive experimental comparison with the ORB-SLAM2 algorithm.(2)The semantic segmentation network based on multi-scale feature fusion are investigated.In the deep learning module,based on the weakly supervised learning approach,a cross-supervised learning approach that combines a large amount of weakly labeled data with a small amount of labeled data is proposed.By this method,the weakly labeled data are paired with the generated high-quality pseudo-labels to effectively alleviate the inconvenience of the system caused by insufficient semantic labels in practical semantic segmentation tasks.Meanwhile,an equilibrated fusion mechanism of local and global attention is designed by combining the multi-branch backbone network structure,and the use of multi-scale feature fusion enables the network to extract richer feature information in small-scale data domains and improve the accuracy of semantic segmentation.Experimental comparisons with mainstream weakly supervised methods and detailed ablation experiments on the PASCAL VOC2012 public dataset validate the performance advantages of the network.(3)The dynamic object elimination algorithm combined with semantic segmentation in visual SLAM is investigated.The semantic segmentation network proposed in(2)is first introduced into the visual SLAM system,and then the predicted semantic segmentation mask is used to remove a large number of feature points located on moving objects,and then the epipolar geometric constraint method is used to calculate the epipolar distance to discriminate the motion characteristics of the retained feature points.Finally,an experimental comparison with the DS-SLAM algorithm verifies that the proposed method has better dynamic object removal effect and is conducive to improving the pose estimation accuracy of the system.(4)Indoor static semantic octree map construction is investigated.Firstly,a singleframe semantic point cloud is generated by combining the depth information and semantic segmentation mask with the inverse projection method of the pinhole camera model.And then the multi-frame semantic point cloud is stitched together to build a globally consistent semantic point cloud based on the pose information of camera.The Gaussian distribution model and voxel filtering are used to perform dynamic points removing and redundancy removal for globally consistent semantic point clouds.Finally,the semantic point clouds are converted into a navigation-usable octree form.The results show that compared with the traditional sparse point cloud map,the dense point cloud map constructed in this thesis contains more detailed environmental information.After being converted into an octree form,it is more conducive to the update and storage of the map,and can be directly used for navigation,path planning and other tasks.The high-performance semantic segmentation network also makes the map carry richer semantic information.The whole system has the advantages of high tracking accuracy,good dynamic interference elimination,strong robustness of mapping as well as being endowed with semantics,which lays a foundation for promoting the upper intelligent application and development of visual SLAM.
Keywords/Search Tags:Visual simultaneous localization and mapping, Point-line features, Semantic segmentation, Pose estimation, Semantic mapping
PDF Full Text Request
Related items