Font Size: a A A

Depth Recovery Of Monocular Video Based On Neural Convolution Networks

Posted on:2018-12-27Degree:MasterType:Thesis
Country:ChinaCandidate:Q C ChenFull Text:PDF
GTID:2348330512473677Subject:Engineering
Abstract/Summary:PDF Full Text Request
Depth map estimation from monocular images is not only the key component of 2D to 3D movie conversion,but also benefits many challenging computer vision problems such as object recognition,semantic segmentation and pose estimation.In recent years,depth estimation approaches based on deep convolutional neural network(CNN)have attracted considerable research interests due to its high generalization power,accuracy and estimation efficiency.In the first part of the thesis,the full convolution networks(FCN)is applied to the problem of depth estimation from single images.The FCN is able to obtain an output result as the same size as the input image,thus is well adapted to pixel level prediction tasks such as segmentation and depth estimation.Meanwhile,fully connected layers are absent in FCNs,which drastically reduces the number of network parameters and thus makes it less prone to over-fitting.To deal with the problem of blurred depth map estimation caused by the drastic up-sampling of the last layers of the FCN network,we designed a multi-scale network by adopting feature maps from the different middle-layers of the network and fuse them by applying different ratio of upsampling.Experiments on the NYU v2 dataset show that this method can achieve high depth estimation accuracy.When dealing with video-based applications such as 2D to 3D video conversion,existing CNN-based depth estimation approaches tend to produce temporally inconsistent depth maps,since their CNN models are optimized over single frames.In the second part of the thesis,we address this problem by introducing a novel spatial-temporal Conditional Random Fields(CRF)model into the DCNN architecture,which is able to enforce temporal consistency between depth map estimations over consecutive video frames.In our approach,temporally consistent superpixels(TSP)is first applied to an image sequence to establish correspondence of targets in consecutive frames.A DCNN network is then used to regress the depth value of each temporal superpixel,followed by a spatial-temporal CRF layer to model the relationship of the estimated depths in both spatial and temporal domain.The parameters in both DCNN and CRF models are jointly optimized with back propagation.Experimental results show that our approach is not only able to significantly enhance the temporal consistency of estimated depth maps over existing single frame-based approaches,but also improves the depth estimation accuracy in terms of various evaluation metrics.generate satisfactory depth maps ready for use.For example,at the areas of an RGB image where color and texture suddenly change,the estimated depth map also tends to be discontinuous even if it should not.To deal with this problem,the third part of this thesis propose an efficient interactive depth map refinement approach which enables the user to modify inaccurate depth maps produced by a CNN network with minor efforts.In our approach,the user can link two points of the estimated depth map to hint the system two points where depth values should be consistent,then this link information is propagated to the similarity graph of the whole image by affinity propagation.The depth map is then re-estimated using the updated similarity graph.The process is iterative until a satisfactory depth map is acquired.
Keywords/Search Tags:Image Depth Estimation, Deep Learning, Spatial-Temporal Continuity, 2D/3D Conversion, Depth Image Restoration
PDF Full Text Request
Related items