Font Size: a A A

3D Semantic Mapping Based On Deep Learning And Monocular Visual SLAM

Posted on:2019-12-10Degree:MasterType:Thesis
Country:ChinaCandidate:H X AoFull Text:PDF
GTID:2428330596961314Subject:Precision machinery and instruments
Abstract/Summary:PDF Full Text Request
With the development of artificial intelligence,machines are beginning to replace humans to accomplish certain tasks,such as sweeping robots,driverless cars,and industrial robots.The perception and understanding of environment is the base and requirement for automation of intelligent systems.Sensors,such as lasers and cameras,are used as "eyes" of machine to capture surrounding information and computers are used as "a brain" to understand the scene.Especially,the visual data is particularly rich in environmental information.How to use them to help machine to understand scenes as a way of human beings is a critical problem in computer vision and artificial intelligence.To solve this problem,this paper uses monocular visual data with SLAM and deep learning to establish a 3D semantic scene model,which presents 3D and semantic information of scene,and guides machines to work autonomously.This paper focuses on visual information to build 3D semantic scene model.First of all,the visual SLAM technology is discussed.The monocular visual “LSD-SLAM” algorithm is selected for building 3D map.Second,we discusses the semantic segmentation technology based on deep learning,analyzes the compression and optimization of deep convolutional neural networks,and proposes a high-performance semantic segmentation network.Finally,semantic segmentation is merged with LSD-SLAM algorithm to convert the 2D image and semantic information into 3D space,and the 3D semantic model is constructed and optimized.In the research of 2D image semantic segmentation,aiming at the efficiency deficiency of traditional semantic segmentation models,we study on compression method of convolutional neural network model,and compresssemantic segmentation model by depth-wise separable convolution and multi-size-kernel fusion convolution.In addition,the multi-scale features are combined with the modules as the pyramid pooling and the optimized residual model.Then,a high-performance semantic segmentation network was designed.With the experimental evaluation,it has been proved that the model achieves a good performance in the model size,speed and accuracy.In the research of 3D semantic scene modeling,the semantic segmentation is used to semantically label the scene at the pixel level,and then combined with LSD-SLAM algorithm.This is aimed to calculate and optimize the depth of semantic frame and convert labels from 2D to 3D.Finally a 3D map with semantic information is constructed.In this process,the Bayesian method is used to fuse semantic tags,and the entire 3D semantic model is globally optimized by dense Conditional Random Field(CRF).Experiments show that the method can accurately and effectively construct the 3D semantic model under both the KITTI data set and laboratory private data,and our method can run in real time at a frequency of 20 Hz on the experimental platform.
Keywords/Search Tags:computer vision, monocular visual SLAM, deep learning, semantic segmentation
PDF Full Text Request
Related items