Font Size: a A A

Research On Monocular Depth Estimation Method Based On Feature Fusio

Posted on:2024-07-30Degree:MasterType:Thesis
Country:ChinaCandidate:Q C WangFull Text:PDF
GTID:2568307106482144Subject:Electronic information
Abstract/Summary:PDF Full Text Request
Monocular depth estimation is a crucial problem in the field of computer vision,aiming to obtain depth information of objects from a single image.This technology is widely used in various fields such as 3D reconstruction,autonomous driving,3D object detection,and underwater image restoration.Compared with other methods such as Li DAR,stereo depth estimation,and structured light cameras,monocular depth estimation has the advantages of low cost,simple system structure,and lower computational requirements.However,existing monocular depth estimation methods suffer from insufficient feature fusion and inadequate utilization of contextual information,leading to low accuracy in depth map prediction.This paper focuses on multi-scale feature fusion and contextual feature fusion methods in monocular depth estimation and proposes two methods to improve model performance and accuracy in monocular depth estimation.The main work is as follows:To address the issue of varying requirements for depth features in different regions and the difficulty in depth reconstruction,we propose a recursive feature fusion-based depth accumulation estimation method.In the encoder stage,the recursive feature fusion module selects and fuses multi-scale features using gated recurrent units recursively,extracting features that adapt to the needs of different regions in the image to replace cross-layer connections.In the decoder stage,the depth accumulation estimation module decomposes the depth reconstruction process into multiple layers,where each layer predicts a depth map with different levels of detail.Finally,these predictions are accumulated to generate the depth estimation result.The experimental results on benchmark datasets in indoor and outdoor environments show that the proposed method has better robustness and effectiveness compared with other relevant methods in recent years.To address the problems of inadequate utilization of contextual information and insufficient global-local depth feature fusion in monocular depth estimation,we propose a monocular depth estimation method based on global-local context modulation.First,in order to extract richer global-local information,Swin Transformer is used in the feature extraction stage to extract features with a global receptive field.In the decoder stage,convolutional operations are used to process the feature map to enhance its local features.Then,in the connection part between the encoder and decoder,a contextual selection module is used to fully fuse global and local features by selecting contextual information using attention mechanisms.At the same time,to better refine the predicted depth map,the depth regression problem is transformed into an ordered regression problem within a continuous depth range.The highest and lowest resolution feature maps are used to predict the distribution of the interval range of the depth map,and the final depth is the linear combination of the probability distribution of the interval center of each pixel and the corresponding interval center.The experimental results on the KITTI and NYU Depth V2 datasets show that the proposed method achieves competitive results and accurately predicts the depth map with higher precision.
Keywords/Search Tags:monocular depth estimation, feature fusion, gated recurrent unit, contextual information
PDF Full Text Request
Related items