Font Size: a A A

Real-time Monocular Depth Estimation Based On AIoT Edge Device

Posted on:2024-06-22Degree:MasterType:Thesis
Country:ChinaCandidate:X H LiuFull Text:PDF
GTID:2568307067973739Subject:New Generation Electronic Information Technology (including quantum technology, etc.) (Professional Degree)
Abstract/Summary:PDF Full Text Request
Depth estimation is an important part of computer understanding of 3D scenes and has a wide range of applications in artificial intelligence Io T(AIo T).Io T devices use images to estimate the depth information of a scene to enable tasks such as device navigation,path planning and reality augmentation.Implementing low-power,real-time depth estimation on edge devices is relevant for extending AIo T applications.However,most of the current depth estimation algorithms seek to obtain highly accurate depth information,which is very demanding on the device hardware.To achieve high performance depth estimation on edge devices,this paper proposes a low latency decoder and encoder,and a good trade-off between real-time and accuracy of monocular depth estimation algorithms by fusing different cue information and global information.The main innovations are as follows:(1)A real-time monocular depth estimation for fusing object structure information under a single pipeline is proposed.To address the problem of few real data for depth estimation tasks,the model enables the network to extract structural information of objects in different styles of images through a style translation technique.The model can successfully predict depth in real scenes even when trained using only virtual datasets.The model is also able to improve depth estimation performance by guiding the network to generate depth features through multi-scale structural information and global attention.(2)A real-time monocular depth estimation merging global features is proposed.In this paper,a hardware-friendly low-latency encoder and decoder are proposed for artificial intelligence edge devices.Together with this encoder and decoder,this model is able to achieve real-time inference on edge devices.Meanwhile,to solve the problem that convolutional networks rely too much on local features,this paper proposes the Transformer Conv module combining convolution and Transformer.The model improves the accuracy of the algorithm’s depth prediction by fusing global and local features.The RMSE of proposed depth estimation exceeded existing low-latency monocular depth estimation methods,reaching 0.554.(3)To evaluate the effectiveness of the proposed network,experimental evaluations were conducted on commonly used indoor datasets and deployed on Jetson Nano edge devices for field testing.The experiments show that the above model achieves excellent performance on edge devices and outperforms related approaches.
Keywords/Search Tags:AIo T, attention, real-time monocular depth estimation, transformer, style transformer
PDF Full Text Request
Related items