| Estimating the 6D pose of rigid objects from monocular images is an important research direction in the field of 3D vision.This monocular image-based method has the advantages of low cost,easy data acquisition,and easy data processing,and has broad application prospects in the fields of robot arm operation,autonomous driving,and augmented reality.Most of the current object 6D pose estimation networks rely on 3D models of objects,which can only estimate the pose of specific object instances and cannot be extended to ”unseen” objects.In this thesis,we propose a 6D pose estimation network model based on an implicitly encoded prior for category shape,which relies on category information to extend the 6D pose estimation capability to ”unseen” object instances without relying on the 3D model of the object instances.The main work and innovations of this thesis are as follows:1.In this thesis,we design a feature fusion approach based on a self-attentive mechanism,which is used to better fuse pixel features and point cloud features in instance RGB-D inputs.And we use a similar approach to learn the fusion encoding of object instance features and category shape implicit features.2.This thesis proposes a category shape implicit encoding to solve the shape variability between different instances within a category.The existing 3D model instances in the category are used to learn the implicit encoding of category-specific 3D shapes by auto-encoder for subsequent 6D pose estimation tasks.3.This thesis proposes an algorithmic framework for predicting 6D poses of categorylevel target objects from monocular RGB-D images.This framework introduces category shape implicit encoding as a prior to reconstruct object 3D models and guides the network to better predict the correspondence between NOCS coordinates and camera spatial coordinates.Extensive experiments on synthetic and real datasets show that the 6D pose estimation network model for category-level objects proposed in this thesis is more effective in estimating the 6D poses of category-level objects and has better performance in the 6D pose estimation metric. |