Font Size: a A A

Research On Stereo Depth Estimation For Autonomous Driving Scenario

Posted on:2023-05-01Degree:DoctorType:Dissertation
Country:ChinaCandidate:X W YangFull Text:PDF
GTID:1522307037990779Subject:Mechanical Manufacturing and Automation
Abstract/Summary:PDF Full Text Request
Environmental depth perception is a key link in autonomous driving technology and the basis of path planning and decision control of autonomous vehicles.Compared with sensors such as lidar and millimeter wave radar,binocular vision has the advantages of rich image information,low cost and easy deployment,and is widely used in automatic driving systems.In recent years,with the development of deep learning technology,compared with traditional methods,the binocular depth estimation methods based on convolutional neural network have been greatly improved in depth estimation accuracy and reasoning speed.However,many problems are still worth thoroughly studying in complex road scene.For instance,the prediction accuracy of depth estimation is not accurate in inherently ill-posed regions such as textureless regions,weak texture areas,repeated patterns,near boundaries and occluded areas.The generalization ability of networks is poor in unknown driving scene.Networks models cannot meet the needs of vehicle embedded devices due to complex model and long time-consuming.Based on deep learning and image processing technology,this dissertation starts with binocular depth estimation in terms of feature extraction,matching cost construction,cost aggregation,and disparity optimization.The main work are summarized as follows:Firstly,aiming at the problem that binocular depth estimation algorithms are prone to mismatching in textureless regions,weak texture areas and repeated patterns on driving scene,a binocular depth estimation network based on pixel attention mechanism and channel attention mechanism is proposed.In the feature extraction stage,the network combines the pixel attention mechanism with the pyramid convolution module to extract more global context information and high-level feature with pixel attention.In the cost aggregation stage,high-level semantic feature and low-level texture feature are fused by the attention aggregation module to reduce the information loss in encoder-decoder structure.The channel attention mechanism is introduced to identify the feature beneficial for disparity calculation,and its weight is increased to guide the low-level feature to recover more details.Then the stacked hourglass structure is used to learn more context information to regularize the matching cost volume.Meanwhile,to prevent the phenomenon of gradient disappearance,gradient explosion,network degradation and improve the stability of model during training,the 3D residual optimization module is introduced.Experimental results show that the introduction of attention mechanism in binocular depth estimation network can significantly improve the depth estimation accuracy of textureless regions,weak texture areas and repeated patterns in automatic driving scenes.Secondly,aiming at the problem that inaccurate depth estimation in the edge region of target objects in autonomous driving road,a multi-task binocular depth estimation network based on edge detection and multi-scale matching cost volume fusion is proposed.In the feature extraction stage,the edge branch network is used to extract the image feature,and the learned edge geometric feature are embedded into the disparity estimation branch network for constructing multi-scale matching cost volume.In the cost aggregation stage,the matching costs of different resolutions are fused to expand the receptive field of the model,which can capture more global and structural representations for cost calculation.By introducing the binary cross-entropy loss function and the edge-aware smooth loss function into the disparity network loss function,the disparity gradient of the target boundary is constrained to improve the disparity accuracy of the target boundary.The left-right consistency check is introduced to construct the reconstruction error and modify the initial disparity error in the disparity optimization phase.The dilated convolutions are used to increase the receptive field and provide more context information for the disparity discontinuous region and the occluded region.Experimental results show that the binocular depth estimation network combines multi-scale matching cost volume and edge geometric clues can improve the disparity estimation accuracy of occlusion area and near boundaries in autonomous driving scene.The predicted disparity map of the network preserves clear edges and avoids deep over-smoothing and blurring in the deep discontinuous regions.Thirdly,aiming at the problem that binocular depth estimation algorithms have poor generalization ability in unknown driving scenes,a stereo matching network based on guided matching cost volume and transfer learning is proposed.The network uses the output of the feature extraction module to construct the matching cost volume,and uses different expansion rates convolution and hourglass structures to regularize the matching cost volume.The attention weight of the matching cost volume is obtained by compressing the channel dimension of the aggregation feature,and the attention weight is used to filter the redundant information in the initial cascade cost volume to enhance the similarity between the matching pixel and the candidate pixel.At the same time,the model is trained by transfer learning on the Driving Stereo dataset,and the learned parameters are used as the initial parameters of the model to finetune on the KITTI dataset.Transfer learning technology is applied to improve the robustness of disparity prediction in different driving scenarios.The experimental results show that the proposed guided matching cost body has a good representation ability for the correlation between matching pixels and candidate pixels,and improves the depth estimation performance of binocular perception for autonomous driving.Finally,aiming at the problem that the complex binocular depth estimation model with a large number of parameters is difficult to be applied to vehicular embedded devices with limited computing power and storage space,a light-weight binocular depth estimation network based on multi-scale feature fusion and color guidance is proposed.The network uses Mobile Net V2 blocks to construct a feature extraction backbone network,and extracts feature by continuous down-sampling of input feature.The channel attention module automatically learns the weights of all of the feature maps from a convolutional layer according to the contributions of these feature maps for disparity calculation task,then applies these weights to the feature maps to enhance the weights of the feature maps with large contribution and suppress the weights of the feature maps with small contributions to the stereo matching task.The model fuses feature of different stages through multi-scale feature fusion module and obtain more effective global context information for constructing Euclidean distance matching cost volume.In addition,In the disparity optimization stage,left-right consistency detection and color guidance are introduced to optimize the initial disparity and improve the sensitivity of the model to the target edge and the ability of disparity map edge detail recovery without increasing the model computation.The experimental results show that the light weight binocular depth estimation method proposed in this paper can quickly predict the disparity with high accuracy by using limited computing resources,which can meet the needs of practical applications.
Keywords/Search Tags:Automatic driving, environment perception, deep learning, convolutional neural network, binocular depth estimation
PDF Full Text Request
Related items