Font Size: a A A

Research On Skill Learning Methods For Robotic Grasping And Packing In Comolex Scenes

Posted on:2024-02-20Degree:DoctorType:Dissertation
Country:ChinaCandidate:S YangFull Text:PDF
GTID:1528306917989049Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
With the development of society and advancements in technology,the demand for robots in various fields such as industrial manufacturing,domestic services,healthcare,agricultural production,and national defense construction is growing day by day.However,the perceptual decision-making level of robots in complex working scenarios is still weak and far from meeting people’s expectations.Enhancing robots’ ability to perceive and understand the external environment and improving their decision-making level in complex dynamic settings is one of the key issues that urgently needs to be addressed in the field of robotics.This paper,focusing on the practical application requirements of robotic arm in various tasks,aims to enhance the understanding of complex dynamic scenes and systematically investigates how to improve the efficiency,robustness,and adaptability of robot skill learning.The main contributions and innovations are summarized as follows:1.To achieve flexible and efficient robotic grasping,we proposed a grasping skill learning method based on video captioning.First,the problem of complex multi-task imitation learning is decomposed into two subproblems:demonstration understanding and skill learning,by using the "divide and conquer" strategy.To address the demonstration understanding problem,we propose a demonstration video understanding model based on visual difference enhancement,which can effectively mine the content change information of demonstration videos in the temporal dimension and accordingly form a high-quality representation of the videos.Then,the video representation is converted into descriptive sentences through the proposed video captioning method,achieving accurate semantic understanding of human demonstration videos.To address the skill learning problem,a deep reinforcement learning-based manipulation affordance prediction model is built,enabling the robot arm to learn various manipulation skills through a self-learning manner.By combining this with the proposed demonstration understanding method,a novel technical solution of "video-text-action" is formed,enabling the mapping from human demonstration videos to robot action execution.Experimental results show that the proposed visual demonstration-based multi-task learning method can reliably complete various imitation tasks in diverse scenarios,demonstrating high robustness and good generalizability.2.To further improve the generalization capabilities over different scenes and the efficiency of task execution for visual demonstration methods,we propose a cross-context robot visual imitation learning method.The key to enabling cross-context skill learning is to abstract semantically consistent expressions for the same task in different scenes through effective perception algorithms.To this end,we propose a context translation model based on a feature difference-based twin structure.By mining the task-level correlation between the demonstrator’s scene and the imitator’s scene,the cross-context visual style transfer and task semantic information association are achieved for demonstration videos.Furthermore,we extend the context translation method with depth modality,introducing a depth estimation method for cross-context scenarios,providing more modality of observations for robot skill learning.By integrating color and depth data,we build an end-to-end robot skill learning model based on multi-modal fusion,enabling the direct mapping from multi-modal observation to complex execution actions.And it significantly enhances the robot’s capabilities for scene perception and decision-making in complex environments.Finally,extensive experiments demonstrate that the proposed method can effectively overcome the perceptual understanding challenges brought about by cross-context scenarios and complete cross-context visual imitation tasks through accurate and efficient action mapping.3.To address the issues of low accuracy and poor robustness of target grasping in complex cluttered scenarios,an autonomous target grasping method for cluttered scenes has been proposed.The object occlusion in complex cluttered scenarios makes it difficult to obtain complete observations for accurate scene perception.Therefore,we propose a multi-object multi-class scene representation for complex scenarios.By integrating geometric attributes and high-level semantics,this representation enables robots to accurately perceive and locate objects in the scene from both visual and semantic perspectives.On this basis,we construct a multi-channel grasping saliency prediction model based on reinforcement learning,endowing robots with efficient and reliable grasping capabilities and achieving robust target grasping in cluttered scenarios.Finally,extensive comparative experiments are conducted to fully verify the effectiveness of the proposed method.4.To address the problem of low efficiency and poor stability for robotic packing in unstructured scenarios,we propose an online heterogeneous bin packing method for robots in unstructured logistics scenarios.Firstly,drawing on human packing experience,we build a learning and transfer model based on imitation learning for online 3D bin packing,which efficiently solves and optimizes the online three-dimensional bin packing problem and significantly improves the overall space utilization.To deploy the 3D bin packing policy in diverse real scenarios,we propose a target-oriented 6-DoF packing pose estimation method and a dynamic packing path optimization method.These method enables reasonable action planning based on accurate perception of unstructured logistics scenarios,achieving efficient,robust,and adaptive bin packing performance in real logistics scenarios.Experimental results show that the proposed method for robotic packing in unstructured logistics scenarios performs well in various scenarios,resulting in neatly stacked and tightly arranged carton piles with a high space utilization rate.
Keywords/Search Tags:Skill learning, Robot, Learning from demonstration, Scene understanding, Reinforcement learning
PDF Full Text Request
Related items