Font Size: a A A

Research On Intelligent Perception,Understanding And Interaction Technology Of Indoor Scene

Posted on:2024-04-12Degree:MasterType:Thesis
Country:ChinaCandidate:H W ZhuFull Text:PDF
GTID:2568307154998679Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of 3D vision,intelligent indoor scene perception,understanding,and interaction technologies have become the focus of research in both academia and industry.These technologies are widely used in various fields such as intelligent manufacturing,augmented reality,intelligent robots,and games.In many consumer-level applications,untrained individuals can easily use this technology to scan and analyze scenes or for entertainment.At the same time,in professional equipment such as 3D manufacturing,virtual maintenance,and robots,scene perception,understanding,and interaction often play a core role in system technology.Scene perception,understanding,and interaction technologies can be divided into three parts.First,scene perception is the process of a computer observing the world and constructing a digital model of a real scene,with classical 3D reconstruction algorithms being the mainstream.In current research,the quality of scene perception is mainly limited by the speed and accuracy of 3D reconstruction and the storage space of 3D models.Second,scene understanding is the process of a computer analyzing the constructed scene digital model and recognizing objects in the scene.Scene understanding is mainly based on learning-based 2D or 3D segmentation methods.Existing scene understanding works are mainly limited by recognition accuracy and speed.Third,scene interaction is the interaction between the user and the virtual digital model,and it is the bridge between scene perception and understanding technology and specific tasks.Therefore,this thesis focuses on the research work on the difficulties of existing scene perception,understanding,and interaction technology.Specifically,the research content and innovative points of this thesis are as follows:(1)This thesis proposes a real-time scene perception and understanding framework.Using a consumer-level RGBD camera,3D reconstruction and 2D segmentation technologies are combined to construct a 3D indoor scene with a semantic 3D model.To improve the quality of scene perception and understanding,this thesis uses the Bundle Fusion 3D reconstruction framework and the Mask R-CNN 2D instance segmentation network to recognize objects in the 2D image online while reconstructing,and fuse the semantic information of the objects into the reconstructed scene model.Compared with the traditional method of reconstruction before segmentation,this method has higher recognition accuracy and faster reconstruction speed,and shows higher segmentation accuracy on the Scan Net public dataset.At the same time,compared with the existing work that combines 3D reconstruction and instance segmentation,this method has higher reconstruction accuracy on the TUM public dataset.(2)Existing 3D reconstruction based on traditional algorithms all use GPU parallel computing technology to improve system operating speed,which also leads to the reconstruction accuracy depending on the size of GPU storage space.In order to improve the reconstruction accuracy on limited GPU resources,this thesis proposes a multiresolution implicit 3D scene representation,which divides the real world into non-uniform grids and uses a signed distance function with truncation(TSDF)to express the reconstruction information.Voxel blocks with different densities are used to store TSDF values.High-density voxel blocks are used to reconstruct recognized objects in the scene to achieve higher reconstruction accuracy,while low-density voxel blocks are used to reconstruct other parts of the scene to save GPU storage space.At the same time,in order to visualize the real-time reconstruction effect,a rendering method based on a multi-resolution voxel model is designed.Compared with mainstream 3D reconstruction methods,this method shows higher reconstruction accuracy on the YCB dataset and achieves better visualization effects.(3)Scene interaction technology has important application value in the fields of virtual display,interactive education,and industrial design.For example,it can be used for restoring damaged scenes,virtual display of examples in teaching,virtual maintenance,virtual training,and designing complex industrial products.This thesis designs a new scene interaction framework.On the one hand,it recognizes the interaction behaviors between people and objects in dynamic scenes,and tracks dynamic objects.On the other hand,it establishes a 3D model library of common indoor items,which mainly includes five categories: animals,food,model toys,industrial parts,and daily necessities.The model library supports adding,deleting,and modifying the reconstructed objects.(4)For moving objects,RAFT is used to estimate optical flow between adjacent frames.The motion residuals of objects in the semantic map are calculated using the optical flow,and dynamic rigid objects are identified and tracked.Features on dynamic rigid objects are then removed.Finally,stable feature points for calculating camera poses are obtained,achieving more accurate camera pose estimation and 3D reconstruction.In the ablation experiment,compared with the strategy of only removing feature points of objects using prior information,higher accuracy in camera pose optimization was achieved.
Keywords/Search Tags:Scene perception, Scene understanding, Scene interaction, Multi-resolution, Dynamic scene
PDF Full Text Request
Related items