Font Size: a A A

Visual Object Segmentation Based On Multi-modal Fusion

Posted on:2024-04-10Degree:MasterType:Thesis
Country:ChinaCandidate:Y T LvFull Text:PDF
GTID:2568306944963949Subject:Mechanical engineering
Abstract/Summary:PDF Full Text Request
With the wide application of intelligent robots in various industries,the research on visual tasks for intelligent robots has become a hot research direction.Among them,visual object segmentation can help intelligent robots understand and perceive scenes.As one of the basic tasks of visual tasks,visual object segmentation is used by intelligent robots to improve the performance of certain application scenarios,such as human-computer interaction,autonomous driving,etc.Based on the research results of 2D visual object segmentation and 3D visual object segmentation,the paper has proposed a visual object segmentation method based on multimodal fusion.In order to facilitate the application of the proposed method in engineering practice,the paper also designs a visual object segmentation system.The main achievements of the paper are as follows:(1)The paper has studied the performance of several classical 2D semantic segmentation networks in indoor scene segmentation by taking multiple 2D images as input.According to the corresponding depth images,the results of 2D images are mapped into 3D space pixel by pixel.Then the performance of indoor scene segmentation is measured in 3D point cloud and according to the experimental results,UNet is selected as the backbone network for 2D modal data processing in the subsequent multimodal semantic segmentation task.(2)The paper has studied the performance of several general 3D point cloud processing models in the application of 3D point cloud object segmentation in indoor scenes,PointNet++is selected as the backbone network of 3D modal data processing for the subsequent multimodal semantic segmentation task.Moreover,the paper further explores the fusion way of 2D and 3D information,and demonstrates the effectiveness of multimodal fusion.(3)Based on the research results of 2D and 3D visual object segmentation,a structure-aware fusion network for visual object segmentation is proposed,in which a structural deep metric learning method is designed on pixels and points to explore their relations and further utilize them to adaptively map the images and point clouds into a common canonical space for prediction.The proposed method could take full advantage of more detailed information of 2D images and more geometric information of 3D point clouds.It can effectively fuse multiple data information of different modes.A comparative experiment was conducted to compare the proposed method with existing mainstream algorithms,verifying the effectiveness of the method.In addition,the paper designs an interactive visual object segmentation system.It integrates the proposed methods,and provides technical support for the practical application of visual object segmentation.
Keywords/Search Tags:visual object segmentation, multimodal fusion, 2D semantic segmentation, 3D point cloud processing
PDF Full Text Request
Related items