Font Size: a A A

Research On Semantic Segmentation Technology Of Road Scene Based On Binocular Vision

Posted on:2021-03-14Degree:MasterType:Thesis
Country:ChinaCandidate:D ZhangFull Text:PDF
GTID:2512306512487254Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
Semantic segmentation is a technology to achieve pixel-wise classification of the scene,which has important application value in the field of assisted driving and autonomous navigation.Due to the complex diversity of road scenes and the difference of illumination,the color difference between different targets may be very weak,which makes it difficult to segment the target.Depth information,as an important feature to describe three-dimensional structures,can depict the geometric relationship between different targets,provide shape characteristics and positional relationships of targets,and help reduce the uncertainty of object recognition.At present,the main ways to obtain depth information are RGB-D cameras,LIDARs and binocular cameras.RGB-D cameras use structured light technology to obtain depth information,but they are susceptible to sunlight and the measurement range is too small,so they are only suitable for indoor environments.Although laser radar has a wide detection range and high measurement accuracy,it is expensive,has a short mechanical scanning life,and the point cloud information is sparse and uneven after it is projected onto the image coordinate system,which makes it hard to be widely applied.The binocular camera simulates human eyes and infers depth information based on the geometric relationship under multiple perspectives,which is more cost-effective.With the development of deep learning and parallel computing devices,it is no longer a problem to quickly calculate depth information from binocular images.Therefore,it is of great significance to study how to efficiently extract the depth information from binocular images,and fuse it with the visual information of images to improve the accuracy and reliability of semantic segmentation.Under this background,this thesis carries out the following exploratory research about semantic segmentation based on binocular vision:(1)We constructed a semantic segmentation framework that integrates binocular depth information.Although there are many depth estimation algorithms for binocular images,the accuracy,running speed,and flexibility of different algorithms are quite different.In this thesis,different algorithms are evaluated under specific dataset.In addition,in order to avoid too many network parameters after merging depth information,we optimized the fusion method between depth information and image information,and uses the similarity between different points in the depth map as a constraint to modify the traditional convolution and pooling operations.Experiments show that the method of depth information fusion improves the segmentation accuracy significantly compared with methods based on monocular image,and the amount of model parameters is not increased.(2)We proposed a semantic segmentation model based on binocular image and cross-level feature guidance.Considering that semantic segmentation and depth estimation require a lot of overlapping visual information,in order to improve the utilization of information,we incorporate the idea of stereo matching into the feature extraction stage,and uses the difference between binocular feature vectors as an implicit depth information.In addition,we modify the Global Attention Upsample module,use high-level features to redistribute low-level features,and enhances the discrimination of features.Experiments prove that this method greatly improves the operation efficiency and has better performance in accuracy and robustness.(3)We designed a multi-task framework based on the attention mechanism to complete semantic segmentation and disparity estimation tasks simultaneously.In order to balance the task of disparity estimation,which requires a large image resolution,and ensure that the parameters cannot be too high,we modify the Feature Pyramid Attention module to complete the extraction and fusion of multi-scale features.Aiming at the problem that most disparity estimation algorithms need to manually specify the maximum disparity value,we apply Parallax Attention Module to measure the similarity between all points on the same epipolar line,so as to get rid of the limitation of disparity search range through this global dependency.Experiments show that multi-task learning can guide the network to pay attention to imperceptible features and improve the segmentation accuracy.Since operations such as pooling reduce the resolution,the learning effect of the disparity estimation branch for small targets is poor,so the structure of disparity estimation branch needs to be further improved.
Keywords/Search Tags:semantic segmentation, binocular vision, convolutional network, attention, multi-task
PDF Full Text Request
Related items