| RGB-D SLAM refers to the process of position a camera and it’s pose and then constructing an environment map simultaneously by using an RGB-D camera as a sensor.Compared with the monocular camera and the stereo camera using algorithm to calculate the three-dimensional coordinates of the space points,the RGB-D camera acquires the three-dimensional information of the space points more directly and conveniently,and so RGB-D SLAM becomes a research focus of the visual SLAM field.The two major goals of SLAM are positioning and mapping.This paper studies the SLAM algorithm based on RGB-D camera.For the principle and structure of RGB-D camera are different from those of the monocular or binocular camera,this paper first introduces the RGB-D camera depth measurement principle and visual geometry model,and then studies the camera motion estimation and nonlinear optimization algorithm in RGB-D SLAM to lay a foundation for the following mapping.Secondly,for the application background of robot positioning,navigation and 3D scene reconstruction,the common mapping methods are studied,including sparse roadmap point maps,dense point cloud maps and octree tree grid maps.Finally,experiments are performed on the TUM RGB-D dataset and field data acquired,the results are satisfying.Moreover,the accuracy of the position and mapping algorithm is calculated to verify the performance of the three-dimensional reconstruction algorithm effective.Points or voxels are constructed in traditional maps,and there is no object label information,which makes it impossible for robots to obtain a better understanding of the map and perform higher-level tasks.Based on this current situation,this paper utilizes the deep residual neural network,and combines the hole convolution and space pyramid pooling methods to image semantic segmentation to assist the SLAM system in constructing three-dimensional semantic maps.Firstly,the real-time RGB-D SLAM is performed on the environment to obtain the key frame numbers and pose information.Then,the 2D semantic segmentation of the key frames are performed offline by means of the deep convolutional neural network(DCNN).Finally,according to the keyframe pose information,the 2D segmentation images are projected to the three-dimensional space and then stitched to build a three-dimensional dense semantic map,and then the full connection conditional random field is used to refine the segmentation.The experiments on the dataset and field data are performed,then the segmentation accuracy is analyzed,the validity of semantic map construction arithmetic is veried.The research in this paper has positive significance for robot positioning,navigation,obstacle avoidance and human-computer interaction. |