| With the rapid development of autonomous vehicle technologies,how to perform high-precision localization in unknown complex outdoor environment has become an important issue.The monocular visual odometry is one of the low-cost and the most widely utilized localization methods.Traditional visual odometry methods calculate relative pose based on the principle of multi-view geometry,which is sensitive to camera parameters and environmental changes.Also the calculation process is complicated.The rapid development of deep learning technology in recent years has provided new solutions for visual odometry,which can perform end-to-end pose estimation.The deep learning-based visual odometry is more robust,but there are still two problems.One is the incapability to handle moving objects,occlusion,and low-texture or highly similar areas.Second,the end-to-end process makes it difficult to correct the pose estimation error,and the accumulated error greatly affects the localization accuracy.To solve these two problems,we propose two solutions based on the idea of confidence evaluation.The first is to evaluate the confidence of the input image pixels.A novel monocular visual odometry based on image region confidence is proposed,which can measure the relative similarity of geometric corresponding regions calculated according to the estimated pose transformation in the associated images.Then this similarity measurement is used to generate confidence mask which can weight pixels in the vision reconstruction loss function,so that the network can be trained in an unsupervised manner and jointly predict the depth and relative pose.The experimental results and analysis show that such scheme can effectively improves the robustness of visual odometry when facing moving objects,occlusion,and low-texture or highly similar areas.The second is to evaluate the confidence of the output pose estimates.A pose optimizer is proposed,which can measure the confidence of original pose estimates based on the trajectory geometric consistency,and then use the attention mechanism to generate an attention vector which can iteratively refine the original relative pose estimates based on temporal convolution.The experimental results show that this scheme effectively redistributes the errors of pose estimates and improves the accuracy of the visual odometry.In summary,a two-stage end-to-end visual odometry framework is proposed.The first stage is pose estimator which focuses on the confidence of input image.The second stage is pose optimizer which concentrates on the confidence of output pose estimate.Both of them aim at improving the accuracy and robustness of visual odometry. |