Research On Binocular Image Based Semantic Segmentation And Disparity Estimation Technology

Posted on:2021-10-18

Degree:Master

Type:Thesis

Country:China

Candidate:B Y Li

Full Text:PDF

GTID:2518306050464774

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Image semantic segmentation refers to classifying each pixel in an image and assigning it a category label to associate it with a real object or concept.At the same time,disparity estimation refers finding out the matching between two perspectives of all or part of the pixels for binocular image,and finding its offset.Disparity can be easily converted into depth data to build a local 3D model of the scene.We refers to the combination of the semantic segmentation and disparity estimation as 2.5D semantic segmentation.2.5D semantic segmentation can be applied to many fields,such as scene perception of autonomous navigation robots,sensor positioning in augmented reality,and automatic analysis of radar images in defense security.With the introduction of the Deep Convolution Neural Network(DCNN),the performance of semantic segmentation and disparity estimation algorithms has achieved a major breakthrough.Based on the classification function of the original DCNN,researchers gradually study its application in image semantic segmentation and disparity estimation.In recent years,DCNN-based semantic segmentation networks use encoder-decoder as a structural basis to perform feature extraction and semantic detail restoration of images.The disparity estimation task has undergone a transition from encoder-decoder structures which focus on feature extraction and disparity detail restoration to Siamese network which focus on feature extraction and feature matching.As the network structure becomes more and more complex and efficient,datasets have also been proposed in large numbers to advance related work.We investigates and analyzes the existing scene understanding datasets,and proposes a large-scale binocular indoor scene understanding dataset.At the same time end-to-end2.5D semantic segmentation network is proposed based on the dataset.We conduct in-depth research from these aspects: data set generation,network model construction,and training strategies:(1)Generation of large-scale binocular scene understanding data sets: Based on the currently published three-dimensional scene model data set,we take the actual situation of robot movement into account to guide the robot to perform global navigation in the scene.In this paper,the navigation results are used to filter and sort scenes,and sequentially perform image rendering based on ray tracing.As a result,the path planning is performed for 5,414 scenes,and binocular RGB data from including 222,778 pose of 312 scenes,as well as binocular semantic segmentation and depth truth labels are rendered.Through statistics,the pixel distribution of semantic segmentation is consistent with the real dataset,ensuring the rationality of the data set.(2)Design and Implementation of 2.5D Semantic Segmentation Network: Based on the design advantages of the current state-of-the-art semantic segmentation and disparity estimation networks,we proposes an end-to-end 2.5D semantic segmentation network that can simultaneously output the results of semantic segmentation and disparity estimation.The network is divided into three parts: feature extractor,semantic segmentation branch and disparity estimation branch.The feature extractor is constructed based on the residual unit in the residual network(Residual Networks,Res Net).The binocular input image is processed by the feature extractor to form a Siamese network structure,in order to get the binocular features of left and right images at different resolutions.Binocular features are processed by spatial pyramid pooling and cost volume respectively to obtain multi-scale semantic segmentation features and disparity estimation features containing depth information.These two type of features are further processed by the semantic segmentation branch and the disparity estimation branch,respectively,and simultaneously output the semantic segmentation and disparity estimation results.Afterwards,we aimed at the multi-tasking nature of 2.5D semantic segmentation network,and determined the optimal training strategy,and further introduced the multi-objective loss function to improve the network performance.Through a large number of ablation experiments,we confirm that the proposed dataset can provide effective image information for the training of semantic segmentation tasks and the optimal training strategy was determined.At the same time,through experiments,we show that the proposed multi-objective loss function effectively improves the metric performance of semantic segmentation,reaching 89.012%,and the error metric in disparity estimation reaches 1.21 pixels.At the same time,experiments show that the multi-objective function applied in this paper enables disparity estimation to be constrained by semantic supervision and provides research directions for further research.

Keywords/Search Tags:

Semantic segmentation, Disparity estimation, DCNN, Siamese neural network, Image rendering

PDF Full Text Request

Related items

1	Research On Image Semantic Segmentation Algorithm Based On DCNN
2	Research On Image Semantic Segmentation Based On DCNN
3	Research On The Semantic Image Segmentation Based On The Deep Learning And The Conditional Random Fields Model
4	Research And Implementation Of Semantic Segmentation Of Indoor Scene Image Based On Embedded Platform
5	Research On Semantic Similarity Of Chinese Based On Improved Siamese Neural Network
6	Image Semantic Segmentation Based On Convolution Neural Network
7	Research On Image Semantic Segmentation Algorithm Based On Deep Learning
8	Research On Image Semantic Segmentation Algorithm By Boundary Assistant Network
9	Research On The End-to-end Semantic Image Segmentation Algorithm Based On The Conditional Random Field Model
10	Research On Image Semantic Segmentation Algorithm Based On Convolutional Neural Network