| Artificial Intelligence with deep learning as its core concept makes it possible for robots to be intelligent.Spatial computing is a key technology that supports intelligent interaction between robots and 3D environments.It mainly solves problems such as perception,calculation,and interaction with the environment which can be widely used in scenarios such as Virtual Reality(VR),Augmented Reality(AR),self-driving,and robotic systems.Three-dimensional(3D)object detection,that is,the problem of 6-Do F pose estimation of objects,is a fundamental problem in spatial computing and perception of intelligent robots.At present,the research on 6-Do F pose estimation of objects faces two challenges:(1)The shape representation is the basis of robotic perception and spatial computing,and it is also the core of the object pose estimation problem.However,the existing data structure cannot be directly applied to the object pose estimation method.How to represent the shape that is easy to deal with by neural networks is a scientific problem to solve the object pose estimation,(2)Training Data is the core of the deep learning algorithm.Now the monocular pose estimation method based on deep learning relies on the accurate object CAD models and object pose labels。Labeling the data is a very complex and tedious process.How to reduce the dependence of deep learning algorithms on data labels is an important research content of learning-based pose estimation methods.Focusing on the above challenges,the thesis has carried out systematic research work focusing on the problem of 6-DOF object pose estimation,and achieved the following innovative research results:1.This thesis proposes a framework for solving the problem of monocular object pose estimation,and proposes a unified rendering-based representation method for objects.The thesis summarizes the existing methods,makes a reasonable abstraction of the monocular object pose estimation method using deep learning technology,combines the existing explicit and implicit object representation methods,and proposes an renderingbased object unification.A solution framework for the general pose estimation problem is designed.According to different label generation strategies,we divide the pose estimation problem into two broad categories: fully-supervised learning and weakly-supervised learning problems.2.Aiming at the problem of explicit object shape representation under the fullysupervised problem,the thesis proposes an explicit object shape representation by star convex hull approximation,named Polarmesh.Based on spherical projection and polar coordinate calculation,we represent the irregular 3D shape,and convert it into a fixedsized 2D grid,named Polarmap by star convex hull approximation,which can be directly and conveniently integrated into the 2D convolutional neural networks.Since the object rotation is implicitly encoded into the Polarmap,we designed a pose estimation network,named PM-Net,to directly estimate the Polarmap and the coordinates.In experiments on public datasets,we verified the effectiveness of the method.It can not only obtain the shape of the object with high quality,but also estimate the object pose with high accuracy.3.Aiming at the problem of implicit object shape representation in weak supervision,the thesis proposes an implicit representation method of objects encoded by neural network.In order to eliminate the dependence of 3D labels of objects and 3D object model labels in the training dataset,we design a deep learning network model based on multiview constraints to directly regress the 6-Do F object poses.We implicitly encode the object shape information into the neural network in terms of the network parameters.Our proposed Rotated-Io U(Intersection of Union)loss function uses multi-view constraints and the segmentation mask to supervise the direct pose regression.We perform experimental verifications in different datasets.Although there is a gap compared to the experimental results of fully supervised methods,our weakly-supervised direct pose estimation method can achieve basic estimation accuracy and can achieve near real-time.4.For the implicit object shape representation in weak supervision,the thesis proposes an implicit object representation based on Neural Radiance Field(Ne RF)technology,named OBJ-Ne RF.The OBJ-Ne RF model is reconstructed in a weakly-supervised manner from training data with known relative camera poses and 2D segmentation masks from multiple viewpoints.Based on the OBJ-Ne RF object representation,we design the object pose estimation network,named Ne RF-Pose,which takes the monocular image as input and outputs the associated dense correspondence.We also enhance the Pn P+RANSAC pose solving method using proposed OBJ-Ne RF.We verify through experiments in public datasets that the OBJ-Ne RF implicit object model can be reconstructed with high accuracy in weak supervision,and the proposed method can obtain comparable pose accuracy to the fully-supervised methods.In conclusion,under the framework of spatial computing,this thesis proposes a rendering-based object representation,which makes the framework of monocular object pose estimation clearer and more direct.By systematically studying different object representation methods under different training conditions,several suitable methods are designed to solve the problem of 6-DOF pose estimation in fully supervised or weakly supervised conditions.In tests on public datasets,our pose estimation method obtains robust and high-accuracy object pose,demonstrating the feasibility and effectiveness of our proposed framework. |