| Capturing high-quality dense depth maps is a crucial link for scene perception and 3D recon-struction,and it is a popular direction in the field of computer vision.The method of obtaining high-quality depth information from single image is monocular scene depth inference.Since the method itself is an ill-posed problem and has less available image information,it is more diffi-cult and challenging.According to different camera data types,i.e.,degraded depth image and color image,monocular scene inference methods can be further divided into monocular depth recovery and monocular depth estimation.The methods for monocular depth recovery usually have some problems,such as imprecise model description and non-convex objective function.In contrast,the methods for monocular depth estimation often need a large number of depth labels due to lack of geometric constraints.Thus,in order to solve the problem of monocu-lar scene depth inference based on heterogeneous camera data,this thesis proposes an image decomposition model based on sparsity-promoting prior and an unsupervised monocular depth estimation method combined with prior knowledge and network structure optimization to obtain high-quality scene depth maps.The main contents include:1.Image decomposition model based on sparsity-promoting prior for monocular depth re-covery.The existing methods for monocular depth recovery are often insufficient to depict the essential characteristics of depth image in different regions,and can not guarantee the convexity of objective function,leading to complex solving process.Thus,from the perspective of signal decomposition,depth image can be divided into smooth regions and step discontinuities regions,and then the least square polynomial and sparsity-promoting prior are utilized to fit there two re-gions to establish a more accurate optimization model.Though the proposed sparsity-promoting prior based on the Moreau envelope is non-convex,this thesis can prove the convexity of the whole objective function for each variable under some mild conditions.The proposed model is solved by the proximal gradient(PG)method combined with the alternating direction method of multipliers(ADMM)algorithm and the corresponding convergence analysis of algorithm is also given.At last,an accelerated algorithm is provided to reduce running time in testing phase.The extensive experiments on noise,depth value missing and low resolution in Middleburry dataset demonstrate that the proposed method achieves better results than other methods,which verifies the effectiveness of the proposed model and algorithm.2.Research on unsupervised monocular depth estimation combined with prior knowledge.At present,the supervised monocular depth estimation methods usually need a large number of real ground truth depth label as training data.In contrast,the unsupervised monocular depth estimation methods overcome the shortcoming of the above methods by exploiting stereo image pairs and monocular videos during training,and infer depth map from monocular input image in testing phase.However,due to lack of the supervision of real depth label,the numerical results and visual effects for depth estimation are usually inferior to those of supervised monocular depth estimation methods.Therefore,this thesis proposes to combine the prior knowledge based on natural scene,including hand-crafted and learnable prior,in order to improve the accuracy and vi-sual effects.The proposed method exploits the attention mechanism and rectangular convolution to capture the information of feature dependencies respectively and designs a geometric-aware loss function to construct the relationship between color image and predicted depth map.The learned composite proximal operator is proposed to refine the obtained initial depth maps by sim-ulating the proximal operator based on the variational model.The qualitative and quantitative experiments on KITTI dataset show that the proposed method outperforms the existing unsuper-vised methods.The experimental results on Make3D dataset show that the proposed method has better generalization performance.3.Research on depth refinement and network structure optimization based on unsupervised monocular depth estimation.Due to the large number of pooling and down-sampling operations in the network structure,leading to the loss of feature information,the current depth inference results based on a single network are usually not satisfactory.Moreover,the structure design of the existing unsupervised monocular network often has limitations,i.e.,it can not fully explore the stereo information during training.To deal with the problems,firstly,this thesis proposes a depth refinement method based cascaded network structure.This kind of cascaded network structure can fully capture complementary features at all levels and improve the ability of feature representation in a coarse-to-fine manner.After that,a novel network structure which combines monocular network and stereo network is proposed.It allows monocular image or stereo image pairs as input in testing phase depending on the input mode.In order to further improve the performance of monocular network,this thesis proposes a new distillation mechanism to assist monocular depth prediction by using the information of stereo network during training,so that the unsupervised monocular depth estimation network can learn more accurate geometry knowledge.Meanwhile,a recursive network strategy and a feature-driven adaptive refinement module are used in the stereo network to enhance its inference performance and the ability to assist the learning of monocular network.A large number of qualitative and quantitative experiments on KITTI dataset demonstrate that the proposed method outperforms the existing unsupervised learning methods,and beats some supervised learning methods for monocular depth estimation,which verifies the effectiveness of the proposed method. |