Font Size: a A A

Research On Lightweight Monocular Depth Estimation Algorithm And Its Implementation

Posted on:2022-05-23Degree:DoctorType:Dissertation
Country:ChinaCandidate:X H TuFull Text:PDF
GTID:1488306731466834Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Monocular depth estimation recovers depth information of scenes from monocular images.It has always been an important issue in computer vision.It is widely used in many fields such as unmanned vehicles,unmanned aerial vehicles,robots,and so on.The monocular depth estimation on embedded devices has become an urgent problem.Recently,most depth estimation algorithms rely on computing platforms such as high-performance servers.A few depth estimation algorithms are real-time on embedded devices,and their accuracy has not yet met the requirements of embedded devices.In addition,existing depth estimation algorithms often neglect to trade off between accuracy and delay,as well as to match with different hardware architectures on diverse embedded devices.In response to these challenges,our main research content and contributions are as follows:(1)To address the problem of limited accuracy in lightweight monocular depth estimation on embedded devices,this thesis aims to greatly improve the accuracy.First,we design an encoder-decoder algorithm DEM(a depth estimation model).DEM relies on a dual path network.Its encoder alleviates the defect that existing encoders cannot reuse and re-explore features at the same time;the decoder alleviates the defect that existing decoders cannot effectively learn local features,thus DEM improves the accuracy of monocular depth estimation.Second,to further improve the accuracy of DEM,this thesis proposes a loss function,using the relative depth relationship to guide the training of DEM.Then,this thesis uses the existing optimization method to accelerate DEM on the TX2 GPU(graphics processing unit)computing platform,without changing the accuracy.To alleviate the scale ambiguity problem in monocular SLAM(simultaneous localization and mapping)and verify the accuracy of DEM in scene reconstruction,this thesis relies on DEM to design a SLAM system.The proposed SLAM system contains eight plug-and-play modules: DEM,feature detection,descriptor computation,feature matching,pose prediction,key frame extraction,loop closure detection,and pose-graph optimization.Each module can be replaced by others flexibly.Extensive experiments have demonstrated that the proposed DEM improves the accuracy of depth estimation;the loss function of training DEM improves accuracy by at least 0.8% compared with other loss functions;the optimized DEM reduces its inference delay,CPU(central processing unit)/GPU usage,power and energy consumption by 10.8%,4.8%,1.3%,2.9%,and 13.9% on the embedded GPU computing platform without losing its accuracy(evaluated on the NYU-Depth-v2 test dataset);the DEM-based SLAM system reconstructs indoor and outdoor scenes more accurately;the application of DEM in Li DAR(light detection and ranging)super-resolution has improved accuracy by at least 14.5% than previous methods.(2)Facing the challenge of strictly limited computing resource overhead in depth estimation on embedded CPU platforms,this thesis aims to greatly reduce the computing resource overhead.First,we propose a lightweight encoder-decoder architecture(EDA).EDA’s encoder is designed to extract effective features in real time on embedded devices,relying on existing lightweight classification models;EDA’s decoder is designed to output pixel-level and high-resolution depth maps relying on convolution,transposed convolution,and pixel shuffling layers.Second,this thesis uses deep learning compiler technology to deploy,compile,and optimize EDA,further improving the memory usage,CPU usage,power consumption,and energy consumption of EDA on embedded CPU computing platforms,while maintaining accuracy.Besides,this thesis develops a general framework for fast monocular depth estimation on actual embedded platforms and real scenes.Then,this thesis integrates depth estimation models into the robot operating system(ROS).Here,the depth information published by a depth estimation publisher can be used by other ROS nodes,so that robots perceive environments better.Through experimental verification,our methods have achieved good results on actual embedded CPU computing platforms and scenes.For example,compared with popular algorithms,the optimized EDA has 57.5%,16.1%,10.9%,and 34.9% lower CPU latency,memory usage,CPU usage,and power consumption on the TX2 CPU computing platform with 0.4% higher accuracy than others(evaluated on the NYU-Depth-v2 test dataset).(3)This thesis first proposes a monocular depth estimation algorithm(MDE)to tradeoff computational complexity and accuracy of depth estimation on embedded computing platforms.Second,according to the available computing resources of embedded devices,we design a pruning algorithm with reinforcement learning methods to prune MDE.The pruning algorithm removes the redundant channels of MDE,making MDE automatically to reach the target pruning rate,so that the computational complexity reaches a threshold state.At the same time,this thesis proposes a reward function of reinforcement learning to minimize the accuracy loss when MDE automatically decreases the computational complexity.In addition,to perform monocular depth estimation on different hardware architectures of different embedded devices,the thesis uses a compiler optimization method to match the pruned MDE with different hardware architectures.Meantime,the optimization method reduces latency and power consumption without losing accuracy.Extensive experiments show that MDE has achieved a trade-off between accuracy and delay;the pruned method reduces the inference latency,power consumption,and storage space;the compilation and optimization method adapts MDE to different hardware architectures well.When the input is an RGB image of 228×912 on the KITTI dataset,the pruned and optimized MDE has 71.9%,10.9%,and 0.3% better GPU runtime,accuracy,and power than others on the Nano GPU computing platform.Other vision tasks(such as pixel-level segmentation)can also learn from the above methods of balancing depth estimation accuracy and computational complexity,automatically adjusting computational complexity,or adapting to different hardware architectures,so as to ensure that algorithms can be lightweight and massively deployed on different embedded devices with different hardware architectures.
Keywords/Search Tags:Embedded computing platforms, monocular depth estimation, lightweight, deep learning, convolutional neural network, optimization
PDF Full Text Request
Related items