Font Size: a A A

Research On Key Technologies Of Odometry Based On Deep Learning

Posted on:2024-03-01Degree:DoctorType:Dissertation
Country:ChinaCandidate:H R ZhaoFull Text:PDF
GTID:1528306944975349Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the increasing popularity of cameras and Inertial Measurement Units(IMUs),Visual Odometry(VO)and Visual-Inertial Odometry(VIO)are widely used in fields such as robotics,autonomous driving,and Augmented Reality(AR).In particular,the implementation based on monocular cameras effectively avoids the degradation problem faced by the implementation based on stereo cameras in long-distance scenes,as well as the high cost and sensitivity to light interference faced by the implementation based on RGB-D cameras,and has become the mainstream form of relative positioning technology.In recent years,odometry technologies based on deep learning have directly predicted the geometric correspondence from raw data,overcoming the dilemma of traditional methods that highly rely on low-level features designed by hand.However,the global information of images,the long-distance dependencies between sequences,and the dynamic objects in the environment hinder further improvements in the accuracy of deep visual odometry.Moreover,existing visualinertial odometry systems have high computational complexity and strong coupling,which cannot meet the needs of various heterogeneous devices.Reducing the coupling and computational overhead of the system while ensuring a certain level of accuracy is a challenging task.This paper solves major problems that have plagued the development of deep odometry technology by designing deep neural networks and frameworks effectively.It mainly focuses on three aspects:"how to achieve a generalized visual-inertial odometry"," how to improve the accuracy of monocular deep visual odometry in dynamic environments",and "how to effectively capture the global spatial and temporal dependencies of self-supervised monocular visual odometry".The specific research contents and contributions are summarized as follows:1.In view of the problem of traditional loosely coupled visual-inertial odometry requiring calibration of parameters such as IMU noise and bias,and the higher coupling and lower generality of learning-based visual-inertial odometry,this paper proposes an End-to-End Loosely Coupled VisualInertial Odometry(EE-LCVIO).First,the system provides a fusion module composed of a one-dimensional convolutional neural network and a Long-Short Term Memory(LSTM)network to integrate the results of visual odometry and IMU integration.In addition,to address the issue of existing monocular deep visual odometry being unable to utilize longterm temporal information,this paper designs a Visual Odometry with Spatial-Temporal Two-Stream Networks(TSVO).Experiments on different datasets show that TSVO can utilize the sequential information of 10 consecutive frames,and EE-LCVIO achieves comparable pose accuracy,lower parameters and higher robustness.The running speed is up to 26 frames per second.2.In order to address the problem of decreased camera pose estimation accuracy caused by dynamic objects in the environments,this thesis proposes a dynamic objects detection module that combines a multi-task learning network and a multi-view geometry algorithm.The module first uses a multi-task learning network to perform semantic segmentation and depth estimation from monocular images,and then removes objects with prior known and unknown motion states through semantic information and geometric constraints.Additionally,to address the problem of memory loss when extracting long-term motion information in existing monocular deep visual odometry,this article proposes a Graph Attention Network(GAT)optimized visual odometry.Finally,compared with the current classical learning-based methods,this algorithm can maintain promising generalization performance and reduce the displacement error and rotation error by 30.3%and 64.3%,respectively.The running speed on the GPU is up to 40 frames per second.3.This paper proposes a Transformer-based self-supervised monocular depth estimation and visual odometry(TSSM-VO)to address the issue of insufficient utilization of global spatial and temporal dependencies.The architecture uses a residual neural network and a Transformer block with residual structure to capture local and global features,respectively.A pose estimation network built upon Transformer is used to effectively establish long-term dependencies between multiple frames.Furthermore,to address the issue of ignoring the structural similarity between the augmented depth maps and the predicted depth maps in existing methods,a data augmentation loss function based on Structural Similarity(S SIM)is introduced.Finally,extensive experiments on public datasets demonstrate that the proposed method can not only predict detailed depth maps and accurately consistent camera trajectories,but also show major improvements in parameter counts and computational efficiency(The inferencing speed is around 30FPS for DepthNet).
Keywords/Search Tags:Visual Odometry, Visual-Inertial Odometry, Loose Coupling, Dynamic Objects Detection, Global Dependencies, Structural Similarity
PDF Full Text Request
Related items