| Intelligent perception technology for robots endows them with the ability to perform dexterous manipulation in complex and uncertain environments.6D Object Pose Estimation is a fundamental yet critical issue of it,and it is also a research hotspot widely concerned in the fields of pattern recognition,3D vision and robotics.Robotic applications at this stage are limited by the perception ability of robots and cannot work efficiently in complex scenarios.Robust and accurate object pose estimation is a technical challenge faced by intelligent manufacturing,logistics,household and other service applications.The effective solution of this problem will greatly enhance the intelligent ability of robots and expand the application field of robots.With different modalities of data,this paper aims to propose robust and accurate object pose estimation methods to improve the environment and task adaptability of robots under severe scenarios.Our work will break through the bottleneck of theory and application,and promote the intelligent robot technology and intelligent manufacture to a new level.The object pose estimation problem in complex scenes faces a series of challenges:noise,occlusion,cluttered backgrounds,texture-less surfaces,and other distracting factors significantly degrade the feature representation capability and the performance of pose estimation algorithms.The accuracy of the monocular object pose estimation algorithm still needs to be improved when lacking depth information constraint.The high cost of large-scale real-scene data acquisition and restricted real-world training data scale limit the accuracy and generalization performance of data-driven deep learning methods.To meet the above challenges,this paper is motivated by the internal mechanism of human perception,which is abstracted as the ability of memorization and imagination,the ability of association and prediction as well as the ability of multimodal perception.We model the problems from three perspectives of data prior,feature representation and mapping constraints to improve the accuracy and robustness of the object pose estimation algorithm.In response to the above challenges,this paper conducts modeling from three perspectives: data prior,feature representation,and mapping constraints.First,we carry out research on synthetic data autonomous generation method,which is prepared for data-driven deep learning methods.Furthermore,with depth information available,the pose estimation problem is modeled as a 3D-3D mapping learning problem.The extraction and fusion of the high-representation feature under heterogeneous data sources is key focus.Further,we carry out the research on the reasonable design of network prediction targets and loss functions when depth information is not available.The main contributions and innovations of this paper are as follow:1.A large-scale 6D dataset generation method based on multi-dimensional knowledge prior is proposed,which can automatically generate large-scale realistic 6D pose datasets.This pipeline makes sufficient use of semantic prior knowledge,pose prior knowledge and scene prior knowledge.On the one hand,it solves the problems of uniform multi-view sampling in foreground generation.On the other hand,data coverage and scene authenticity are guaranteed in the process of foreground and background fusion.When utilized for 6D pose estimation algorithms,it can provide abundant training data,and does not require tedious and laborious manual scene setting and labeling process.The effectiveness of the synthetic data is verified through the dataset ablation study,and to a certain extent,it breaks the limitation of the lack of real scene data on the improvement of algorithm performance.2.A 6D object pose estimation method based on instance segmentation network and iterative optimization is proposed.With the dual RGB-D camera structure as the core,a multi-source depth information fusion algorithm is formed to obtain low-cost and high-quality depth information,which provides effective support for the algorithm at the data level.Then we decouple object detection and point cloud registration tasks and optimizes them separately.A deep neural network framework that can fully encode 3D structural information is proposed,which improves the adaptability and robustness of6 D pose estimation methods for random poses,stacked occlusions,and weakly textured workpieces in industrial sorting scenarios.We establish the Bin-Picking system with our proposed 6D pose estimation method to demonstrate the usability of it.The experiments show that the performance of the algorithm can be adapted to the requirements of actual production.3.A 6D object pose estimation method combining semantic and multi-scale geometric information is proposed to solve the problem of feature fusion and end-to-end mapping modeling of multi-source heterogeneous data.A novel way of fusion of appearance semantics and spatial geometric information is designed to break the limitations of insufficient information fusion of previous methods.Then we design a point cloud feature extraction network based on multi-level local information sampling and aggregation,DGCNN and geometric attention mechanism.These feature extraction network can explore the spatial position,topological relationship,geometric structure contained,and global semantic information sufficiently,which can realize full encoding of spatial information,and significantly improves the algorithm’s adaptability in complex scenes such as background clutter,foreground occlusion,weak texture,etc.Next,we decouple the translation and rotation prediction task to formulate a pose prediction module.A loss function with tightly coupled tasks is designed to transform the point cloud registration problem into a neural network mapping learning problem.Qualitative and quantitative experiments on public datasets show that,compared with other SOTA algorithms,the feature extraction fusion and pose estimation method in this paper has higher accuracy and stability,and has stronger adaptability in severe scenarios.4.A dense reconstruction-guided 6D object pose estimation method based on geometric consistency constraints is proposed.A coordinate reconstruction network is designed to reconstruct the dense coordinates of the visible surface of an object in a canonical 3D space to achieve dense prediction of 2D-3D correspondences.Compared with 2D sparse key-point prediction,dense 3D coordinate reconstruction simulates human 3D associative ability,making the network more spatially expressive.Dense prediction realizes pixel-level association and has stronger anti-interference ability in the scenarios of occlusion.The geometric consistency constraint loss function is introduced.Without the constraints of depth information,the prior knowledge in the 3D object model is used to constrain its geometric structure.At the same time,the solution of the Pn P problem is transformed into the parameter learning process of the neural network,forming an effective mapping model for the monocular end-to-end pose prediction problem.Qualitative and quantitative experiments on public datasets show that our method outperforms other algorithms in the same period.To sum up,in view of the current challenging problems in the field of 6D object pose estimation,this paper models the problems from the three dimensions of data prior,feature representation,and mapping constraints.And we then conduct researches from three levels of data support,algorithm model,and practical application.We have proposed a general data generation pipeline and several pose estimation frameworks,which improve the robustness and accuracy of object pose estimation in severe scenarios.Our methods are heuristic for the research community and significant in terms of application prospects for the industry community. |