Structure-guided Monocular Depth Estimation

Posted on:2023-10-04

Degree:Doctor

Type:Dissertation

Country:China

Candidate:X T Chen

Full Text:PDF

GTID:1528306902454514

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

Monocular Depth Estimation(MDE)is one of most challenging tasks in computer vision.It aims to predict the depth value of each pixel in a single color image.Due to the perspective projection of the 3D space to a 2D image,the task is inherently ambiguous and ill-posed.In recent years,with the development of deep learning,the performance of MDE has been greatly improved.However,there are still three challenges to this task:(1)Depth maps predicted by Deep Convolutional Neural Networks(DCNNs)lack fine details of scene structures.(2)It is difficult to obtain sufficient depth data with annotations for network training.(3)Poor generalization performance of MDE networks.For complex scenes,structure reflects information such as scene layout,object shape and relationships between objects,which are crucial for scene understanding and 3D reconstruction.Therefore,structural information could be a powerful prior guide for MDE.Furthermore,structural information,which is more essential for spatial description,can be shared among different domains.This universality can guide the MDE network to achieve stronger generalization ability.Therefore,this thesis mainly studies the structure-guided MDE method to solve the above three challenges.The main work of this paper includes the following three aspects:·Structure-Aware Supervised Learning for MDE.It is very challenging to effectively recover multi-scale geometric structures in depth estimation for complex scenes,even with a large amount of RGB-D data for training.We propose a layer-by-layer residual prediction scheme and design Laplacian Pyramid Structured Depth Estimation Network(LAP-Net)to recover the geometric information of each scale in a scene.In order to reduce the loss of information,an adaptive dense feature fusion module is proposed to recover the overall scene structure and fine object boundaries.The SOTA average metric accuracy is achieved on both the public RGB-D datasets NYUD-V2 and KITTI,and the predicted depth map can recover the fine scene structure and object details.·Generalizable Structural Representation Learning for MDE.To relieve the heavy burden and high cost of depth data acquisition for deep learning,synthetic data can be used to replace real data to train depth estimation networks.However,generalization from synthetic to real data is a challenge.Based on the observation that structural information is highly versatile and could be shared between synthetic and real data in the same scene,we study how to use structural information to reduce domain shift,thus improving the generalization ability of MDE networks in this thesis.We propose to extract structural information based on image disentanglement.By learning generalizable depth-specific structured representations that reveal the essential spatial features,the proposed method improves the generalization ability of MDE.Without using any real-world data,the proposed method still outperforms those domain adaptation methods on multiple real scene datasets,including NYUD V2,KITTI,Cityscapes,etc.,which effectively verifies the learned structural representation has strong generalization ability.·Scene-Adaptive MDE.It is more challenging to generalize a depth estimation network trained on synthetic data to real data for different scenarios.Because when the scenes of synthetic data and real data are different,the domain shift problem is not only caused by the difference in style but also by the difference in scene structures.To break through this limitation,we study a scene-adaptive generalization method from synthetic data to real data.We train the network with synthetic data of mixed scenes,but the large differences between scene structures within the domain make it difficult for network training to converge to the global optimum solution.We propose to learn a Scene-Adaptive Structural Representation,which can not only alleviate domain shift caused by style differences between domains,but also alleviate the training difficulties caused by structure differences of various scenes within domains.Quantitative and qualitative experimental comparisons were conducted on real datasets in various scenarios,demonstrating the effectiveness and generalization ability of the proposed method.

Keywords/Search Tags:

Monocular Depth Estimation, Supervised Learning, Domain Generalization, Scene Structure, Representation Learning

PDF Full Text Request

Related items

1	Research And Application Of Scene Depth Estimation Method For Monocular Image With Self-supervised Mechanism
2	Research On Monocular Depth Estimation Based On Self-supervised Learning
3	Deep Learning Based Monocular Scene Depth Estimation Algorithm
4	Deep Learning-based Depth Estimation Method For Outdoor Scenes
5	Research On Monocular Depth Estimation By Multi-task Learning
6	Monocular Depth Estimation And Reconstruction For Indoor Scenes Based On Deep Learning Methods
7	Research On Self-supervised Monocular Depth Estimation Method Based On Attention Mechanism
8	Research On Technology Of Real-scene Light-field Content Generation By Perceiving Image Depth
9	Depth Estimation From Monocular Image Based On Deep Convolutional Neural Networks
10	Monocular Image Depth Estimation Based On Deep Learning