| Automated driving technology is based on computer vision,photoelectric sensors,and GPS,aiming to control vehicles automatically and safely without human operation.Road scene understanding is the core problem in the computer vision part,which improves the cognitive ability of vehicles to the surrounding environment.Due to the complexity of the real scene,the existing technology can not achieve safe driving without supervision.How to design highperformance intelligent algorithms has become the focus and difficulty of scene understanding.This dissertation focuses on the road scene understanding based on color cameras,thermal cameras,and LIDAR,designing binocular matching algorithm and scene semantic segmentation algorithm to obtain spatial and semantic information of the road scene.The main content and contributions of this dissertation can be divided into the following aspects:The Class Prototype Regression Network(CPRNet)is proposed to tackle the problem of complex environment segmentation in the real road scene.The proposed CPRNet regresses the feature prototypes of each category by learning from a large amount of data.The final results are computed based on the similarities between the targets and the prototypes.This algorithm also uses both the category prototype attention module and the spatial attention module to enhance the target features to improve the spatial structure and the segmentation results.Compared with the original algorithm,the average accuracy of the CPRNet is improved by2.9%.The Residual Pyramid Network(RPNet)is proposed to balance the accuracy and speed in the mobile computing platform.The proposed RPNet uses different levels of feature residuals to estimate the target residuals to obtains the segmentation result.The RPNet increases efficiency and accuracy,the average accuracy and speed are improved by 6.4% and 63% in the public datasets.The Self-Supervised Binocular Matching Network(SBMNet)is proposed to relieve the problem that the supervised learning-based binocular stereo algorithm relies on a large number of labeled data.The proposed SBMNet uses the symmetry relationship between the binocular left and right images to establish the self-supervised training framework.The training process of SBMNet also includes perceptual constrain based on perceptual loss.The perceptual loss makes the estimated 3D disparity data have high precision on both the structure and details.Compared with the same type of algorithms,the overall average error rate of this algorithm on the KITTI dataset is reduced by 0.15%.The Multi-View Template Matching Network(MTMNet)is proposed to tackle the problem that sparse LIDAR point cloud data has poor structural characteristics in the road scene.The proposed MTMNet constructs feature templates of each category and classify the target points by comparing the similarity between the target features and the template features.At the same time,the MTMNet adopts the Multi-view Convolution(MVC)and further builds a multilayer multi-view convolution module to enhance the template features in multiple views and dimensions,thereby improve the matching accuracy.In conclusion,this dissertation proposes binocular stereo matching and semantic segmentation algorithm based on self-supervised learning,metric learning,residual learning technologies to effectively improve the ability of scene understanding,and practical application. |