Visual Object Segmentation Based On Multi-modal Fusion

Posted on:2024-04-10

Degree:Master

Type:Thesis

Country:China

Candidate:Y T Lv

Full Text:PDF

GTID:2568306944963949

Subject:Mechanical engineering

Abstract/Summary:

PDF Full Text Request

With the wide application of intelligent robots in various industries,the research on visual tasks for intelligent robots has become a hot research direction.Among them,visual object segmentation can help intelligent robots understand and perceive scenes.As one of the basic tasks of visual tasks,visual object segmentation is used by intelligent robots to improve the performance of certain application scenarios,such as human-computer interaction,autonomous driving,etc.Based on the research results of 2D visual object segmentation and 3D visual object segmentation,the paper has proposed a visual object segmentation method based on multimodal fusion.In order to facilitate the application of the proposed method in engineering practice,the paper also designs a visual object segmentation system.The main achievements of the paper are as follows:(1)The paper has studied the performance of several classical 2D semantic segmentation networks in indoor scene segmentation by taking multiple 2D images as input.According to the corresponding depth images,the results of 2D images are mapped into 3D space pixel by pixel.Then the performance of indoor scene segmentation is measured in 3D point cloud and according to the experimental results,UNet is selected as the backbone network for 2D modal data processing in the subsequent multimodal semantic segmentation task.(2)The paper has studied the performance of several general 3D point cloud processing models in the application of 3D point cloud object segmentation in indoor scenes,PointNet++is selected as the backbone network of 3D modal data processing for the subsequent multimodal semantic segmentation task.Moreover,the paper further explores the fusion way of 2D and 3D information,and demonstrates the effectiveness of multimodal fusion.(3)Based on the research results of 2D and 3D visual object segmentation,a structure-aware fusion network for visual object segmentation is proposed,in which a structural deep metric learning method is designed on pixels and points to explore their relations and further utilize them to adaptively map the images and point clouds into a common canonical space for prediction.The proposed method could take full advantage of more detailed information of 2D images and more geometric information of 3D point clouds.It can effectively fuse multiple data information of different modes.A comparative experiment was conducted to compare the proposed method with existing mainstream algorithms,verifying the effectiveness of the method.In addition,the paper designs an interactive visual object segmentation system.It integrates the proposed methods,and provides technical support for the practical application of visual object segmentation.

Keywords/Search Tags:

visual object segmentation, multimodal fusion, 2D semantic segmentation, 3D point cloud processing

PDF Full Text Request

Related items

1	Three-Dimensional Object-Level Semantic Mapping For Home Environment Based On Point Cloud Segmentation
2	Research And Application Of Efficient Point Cloud Semantic Segmentation
3	Research On Feature Fusion Driven Point Cloud Semantic Segmentation Metho
4	3D Scene Location And Reconstruction Based On Semantic Segmentation And Point Cloud Fusion
5	Research On Data Processing And Semantic Segmentation Algorithm Of Three-dimensional Point Cloud In Outdoor Scene
6	Semantic Segmentation Methods For LiDAR Point Cloud
7	Study On Efficient Representation Methods For 3D Point Cloud Semantic Segmentation
8	Research On Semantic Segmentation Algorithm Of 3D Point Cloud Based On Context Awareness
9	Research On Point Cloud Segmentation Method Based On Multi-Scale Feature Fusion
10	Research On Key Technology Of 3D Reconstruction For Multimodal Indoor Scenes Based On Semantic Segmentation