With the progress of industrial robot technology and the improvement of industrial automation requirements,robots are gradually being used to perform increasingly diverse tasks.Over the past decade,automated assembly by robots has been a challenging research field,with highperformance peg-in-hole assembly being a hot topic.This article aims to study the compliant assembly technology of manipulators,and proposes a stategy for learning assembly through imitation and reinforcement learning,and achieve automated peg-in-hole assembly tasks of manipulators while ensuring compliance during the task.First,this paper studies the generation of demonstration data for imitation learning.Expert demonstrations play a crucial role in imitation learning training,and the generation of demonstration data is a difficult issue.This paper studies the generation methods of demonstration data in virtual and real environments respectively.In the virtual environment,the reinforcement learning training and evaluation method is used to obtain demonstration data.In the real environment,we model the demonstration data of the manipulator peg-in-hole assembly task,use expert kinesthetic guidance to drag and teach to obtain demonstration data,and propose a"high-frequency sampling,low-frequency recombination" method to process and expand the demonstration data.Second,to solve the problem of low sample utilization in the traditional GAIL framework algorithm,this paper combines the idea of hindsight experience replay and proposes hindsight transformation generative adversarial imitation learning algorithm.The algorithm converts part of the trajectories generated by the generator into expert-like data,and the expert-like data also participates in the training of the discriminator,improving the sample utilization while solving the problem of insufficient expert demonstration data.This paper verifies the proposed algorithm in the Isaac Gym virtual simulation environment,proving that HT-GAIL can accelerate the convergence speed of training,learn policies similar to expert demonstration data,and lay the foundation for subsequent peg-in-hole assembly experiments on physical platforms.Third,in order to further improve task performance and enable policies to exceed the level of expert demonstration data,this paper combines offline reinforcement learning technology and proposes an improved offline adversarial motion priors algorithm.The algorithm deploys the generator policy network parameters trained by HT-GAIL as offline data in the environment and further optimizes the existing policy through the improved AMP algorithm,enabling the policy to complete higher-level tasks with a small amount of training time.This paper verifies the proposed algorithm in the Isaac Gym virtual environment,proving that the improved offline AMP algorithm can optimize the policy in a short time,and the new policy can even exceed the level of expert demonstration.Finally,this paper builds a physical experiment platform for the autonomous compliant peg-in-hole assembly of the manipulator,collects expert demonstration data of peg-in-hole assembly task with 0.80 mm clearance,and conducts training using HT-GAIL algorithm.The experiment results show that the policy converges in about 11.5 hours,and the trained policy can complete the peg-in-hole assembly task with 0.80 mm clearance with 87%success rate,while meeting the task compliance requirements.The experiment proves that HT-GAIL can learn the task policy from the demonstration data,and the comparison experiment with the GAIL algorithm proved that the HT-GAIL algorithm can accelerate the convergence speed.To further improve the performance of the policy,we trained the improved offline AMP algorithm using the network parameters of the policy as offline data.After about 5 hours of training,the success rate of the 0.80mm clearance task increases from 87%to 95%,the success rate of the 0.52mm clearance task increased from 55%to 78%,and the success rate of the 0.18mm clearance task increased from 12%to 57%,while meeting the task compliance requirements.The experiment has demonstrated that the improved offline AMP algorithm can significantly optimize the performance of the policy with only a small amount of time,exceed the level of expert demonstration data and accomplish higherprecision assembly tasks. |