| Image depth estimation technology plays a vital role in the fields of virtual reality,autonomous driving,and 3D reconstruction.The information of a single image itself is insufficient,and depth estimation is difficult.The rapid development of deep learning has opened up a new research direction for single image depth estimation technology.In this thesis,two single-image depth estimation methods are investigated using deep learning networks.Focusing on the problems of low accuracy in the depth estimation of a single image by traditional methods,a single image depth estimation network based on the Dense Net full convolutional encoder-decoder network is constructed,the encoder network extracts the image features based on Dense Net,and the decoder network is based on two existing up-sampling methods to redesign the sampling structure,which uses the form of rapid convolution and rapid mapping as a new sampling method,reducing the amount of network parameters and improving the detailed performance of the depth map.Experimental results show the effectiveness of the proposed method in the task of estimating the depth of a single image.Aiming at the problems of low estimation accuracy and poor depth map quality when only the encoder-decoder network structure is used for single image depth estimation,a encoderdecoder network model of Transformer and CNN fusion is proposed for depth estimation.Firstly,Res Net-50 is used as the backbone network of the encoder-decoder network to extract image features,and the multi-level fusion method is used in the encoder-decoder network to fuse the feature information at each level of the encoder as the input of the decoder,so as to improve the utilization rate of multi-scale feature information in the depth estimation network.Secondly,the output characteristics of the decoder are analyzed globally using the Transformer network,and the multi-head attention mechanism in the Transformer network estimates the depth information from the deep features output by the decoder,which improves the extraction ability of the depth estimation network on multi-scale features and thus improves the accuracy of predicting the depth map.In this thesis,the NYU Depth v2 dataset is used for the training and evaluation of depth estimation networks,the two depth estimation models proposed in this thesis are experimented with,and the experimental results are evaluated by using three evaluation indicators: average relative error,root mean square error,and threshold accuracy.Experimental results show that the single image depth estimation model based on The Dense Net full convolutional encoder-decoder network proposed in this thesis is about 11.3% higher than that based on the δ<1.25 "multiscale network" model.The single-image depth estimation model proposed in this thesis for Transformer and CNN fusion is about 8.4% higher than the model based on "depth attention" based on the δ<1.25 indicator.In summary,the two depth estimation models proposed in this thesis have achieved good experimental results on the single image depth estimation task. |