Font Size: a A A

Structure-guided Monocular Depth Estimation

Posted on:2023-10-04Degree:DoctorType:Dissertation
Country:ChinaCandidate:X T ChenFull Text:PDF
GTID:1528306902454514Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Monocular Depth Estimation(MDE)is one of most challenging tasks in computer vision.It aims to predict the depth value of each pixel in a single color image.Due to the perspective projection of the 3D space to a 2D image,the task is inherently ambiguous and ill-posed.In recent years,with the development of deep learning,the performance of MDE has been greatly improved.However,there are still three challenges to this task:(1)Depth maps predicted by Deep Convolutional Neural Networks(DCNNs)lack fine details of scene structures.(2)It is difficult to obtain sufficient depth data with annotations for network training.(3)Poor generalization performance of MDE networks.For complex scenes,structure reflects information such as scene layout,object shape and relationships between objects,which are crucial for scene understanding and 3D reconstruction.Therefore,structural information could be a powerful prior guide for MDE.Furthermore,structural information,which is more essential for spatial description,can be shared among different domains.This universality can guide the MDE network to achieve stronger generalization ability.Therefore,this thesis mainly studies the structure-guided MDE method to solve the above three challenges.The main work of this paper includes the following three aspects:·Structure-Aware Supervised Learning for MDE.It is very challenging to effectively recover multi-scale geometric structures in depth estimation for complex scenes,even with a large amount of RGB-D data for training.We propose a layer-by-layer residual prediction scheme and design Laplacian Pyramid Structured Depth Estimation Network(LAP-Net)to recover the geometric information of each scale in a scene.In order to reduce the loss of information,an adaptive dense feature fusion module is proposed to recover the overall scene structure and fine object boundaries.The SOTA average metric accuracy is achieved on both the public RGB-D datasets NYUD-V2 and KITTI,and the predicted depth map can recover the fine scene structure and object details.·Generalizable Structural Representation Learning for MDE.To relieve the heavy burden and high cost of depth data acquisition for deep learning,synthetic data can be used to replace real data to train depth estimation networks.However,generalization from synthetic to real data is a challenge.Based on the observation that structural information is highly versatile and could be shared between synthetic and real data in the same scene,we study how to use structural information to reduce domain shift,thus improving the generalization ability of MDE networks in this thesis.We propose to extract structural information based on image disentanglement.By learning generalizable depth-specific structured representations that reveal the essential spatial features,the proposed method improves the generalization ability of MDE.Without using any real-world data,the proposed method still outperforms those domain adaptation methods on multiple real scene datasets,including NYUD V2,KITTI,Cityscapes,etc.,which effectively verifies the learned structural representation has strong generalization ability.·Scene-Adaptive MDE.It is more challenging to generalize a depth estimation network trained on synthetic data to real data for different scenarios.Because when the scenes of synthetic data and real data are different,the domain shift problem is not only caused by the difference in style but also by the difference in scene structures.To break through this limitation,we study a scene-adaptive generalization method from synthetic data to real data.We train the network with synthetic data of mixed scenes,but the large differences between scene structures within the domain make it difficult for network training to converge to the global optimum solution.We propose to learn a Scene-Adaptive Structural Representation,which can not only alleviate domain shift caused by style differences between domains,but also alleviate the training difficulties caused by structure differences of various scenes within domains.Quantitative and qualitative experimental comparisons were conducted on real datasets in various scenarios,demonstrating the effectiveness and generalization ability of the proposed method.
Keywords/Search Tags:Monocular Depth Estimation, Supervised Learning, Domain Generalization, Scene Structure, Representation Learning
PDF Full Text Request
Related items