Font Size: a A A

Research On Visual Odometry Fusion Of Event Camera And Standard Camera

Posted on:2024-06-25Degree:MasterType:Thesis
Country:ChinaCandidate:X C GuoFull Text:PDF
GTID:2568306920451174Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In recent years,the field of robotics and artificial intelligence has developed rapidly in recent decades,more and more fields have begun to practice intelligent unmanned.Intelligent machines are gradually replacing high-intensity manual labor,greatly improving work efficiency with extremely low error rates during long-term operation.For example,unmanned transportation in intelligent logistics warehouses can improve logistics transportation efficiency;Automated drone inspections can significantly improve inspection efficiency and reduce manpower input;and the introduction of unmanned vehicles is expected to greatly alleviate urban traffic congestion in the future.To build an intelligent unmanned machine,it is necessary to integrate various technologies such as precision navigation,big data,artificial intelligence,sensing technology and cloud computing.It should be noted that precision navigation is one of the indispensable and important technologies to make unmanned devices perform tasks according to preset programs.The current unmanned equipment navigation technology can be divided into two categories:active navigation and passive navigation.Passive navigation uses GNSS technology to receive positioning satellite signals for navigation,and is currently the most widely used technology.However,in complex terrain environments or densely populated areas,satellite signals may be interfered with,exposing the defects of passive navigation.At this time,active navigation technology can effectively make up for the deficiency of passive navigation.Active navigation mainly depends on visual positioning technology,that is,visual odometry,which has a research history of more than 30 years and is in a stage of rapid development.It has been widely used in robotics,mixed reality,automated driving,and other fields.Compared with passive navigation,visual odometry can calculate the sensor’s motion trajectory in real-time and build a 3D map without relying on an external positioning system,and can work relatively accurately in unknown environments.Therefore,event cameras can make up for the shortcomings of standard cameras and achieve more robust visual odometry.The basic principle of visual odometry is to track the camera’s motion changes by recording a series of continuous images.Currently,visual odometry based on standard cameras is the most widely used.However,the performance of standard cameras is poor under fast motion and challenging lighting conditions,which has a significant impact on the robustness of visual odometry based on standard cameras.An event camera is a novel biologically inspired sensor that generates asynchronous events when the environmental brightness changes and records them.Compared with standard cameras,event cameras have higher time resolution and broader dynamic range,and have stronger advantages in high-speed shooting and challenging lighting conditions.Therefore,event cameras can make up for the shortcomings of standard cameras and achieve more robust visual odometry.The research goal of this paper is to integrate event cameras and standard cameras to build visual odometry that is suitable for more complex scenes.This paper proposes two end-to-end deep learning methods to fuse event and video modalities to achieve robust motion estimation.The starting point of this paper is to fully utilize the complementarity of two modalities-data:standard cameras provide rich semantic information,while event cameras solve the problem of high-speed motion blur and extreme lighting environments on this basis.After extensive experimental verification,the results show that the DeepEVO model can improve the accuracy of pose estimation,and is more accurate and suitable for more scenes than using video alone to estimate the pose.This paper proposes a method for processing event data and synchronizing it with video data,solving the synchronization problem between high time-resolution event data and low time-resolution video data,making the two modalities more effectively integrated.This paper proposes a fusion module based on channel attention mechanism,and designs an end-to-end fusion event and video visual odometry model based on the fusion module.This model can spontaneously select which modality of event and video is more suitable for the current scene,fully release the potential of event and video two data sources,and Make the model suitable for more complex scenarios.This paper proposes a method of using transformers to achieve data fusion,and uses this fusion module to design an end-to-end visual odometry model that can process event and video data simultaneously.This fusion module not only focuses on channel attention but also on spatial attention information to achieve more in-depth cross-modal fusion.In order to obtain global feature information,this method uses transformer encoder instead of CNN encoder.
Keywords/Search Tags:Visual Odometry, Event Camera, SLAM, Vision-based navigation, Attention mechanisms, Transformer
PDF Full Text Request
Related items