Font Size: a A A

Research On Object Grasp Affordance Prediction And 3D Reconstruction Based On Deep Learning

Posted on:2024-04-12Degree:MasterType:Thesis
Country:ChinaCandidate:Z S WuFull Text:PDF
GTID:2568307151460634Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The problem of grasp skill learning in clutter is a hot topic in the research field of cooperative robots,and scene understanding is the premise for robots to perform grasp correctly.Robots need to infer 3D scenes from incomplete perception to generate more stable grasps.In the real environment,due to the limited workspace,the complete geometry structure information of the object cannot be obtained,and can only be obtained from the limited information and part of the perspective.3D reconstruction is one of the means for robots to understand the work scene.Reconstruction supervision helps capture perceptual geometry features and obtain richer feature representations.The multi-task learning method is used to jointly train the grasp affordance and 3D reconstruction.The concrete research contents are as follows.Firstly,to address the issues of limited capture of geometric information from limited perspectives and low model generalization,a grasp affordance prediction and 3D reconstruction(3D-GAPR)algorithm based on U-MF and implicit neural representation(UMF-INR)is proposed.Adopting a hard parameter sharing method to simultaneously train grasp and reconstruction,improving the model’s generalization ability.Design a U-MF module to extract shared features,achieve interaction between local and global features,and enhance the expressive power of shared geometry features.Implementing differentiable training for two tasks based on implicit neural representation,where computing resources can be adaptively allocated to tasks that are more difficult to train.Then,a 3D-GAPR algorithm based on positional encoding(PE)and hierarchical USwin T is proposed to address the issues of position sensitivity,instability,and low U-MF sharing efficiency in grasp.Encode the position of voxels through a set of learnable parameters,embed the position information into the original data,learn easily grasped positions,and improve the robustness of grasping.Design a hierarchical U-Swin T module based on Swin Transformer to integrate global and local information,generate multi-scale feature representations,and improve feature sharing efficiency.Finally,the PyBullet simulation environment is used to collect data from Packed and Pile scenarios in a self-supervised manner for training the model.Perform grasp in the simulation environment to verify the effectiveness of introducing U-MF-INR,positional encoding,and hierarchical structure for grasping.For a more intuitive observation,visualize the affordance heat map of grasping and further verify the role of positional encoding and reconstruction in grasp.Simultaneously visualize the reconstruction results and verify the embedding of position information to depict the reconstruction details.
Keywords/Search Tags:multi-task learning, hard-parameter sharing, grasp affordance prediction, 3D reconstruction, implicit neural representation
PDF Full Text Request
Related items