| Multi-legged robots are an important research direction in the field of robotics,with high mobility and adaptability.Deep reinforcement learning(DRL)has achieved remarkable results in the control of multi-legged robots,but they still have problems such as slow training speeds and poor transfer effects.Therefore,scholars have introduced meta learning into reinforcement learning and proposed the Meta-Reinforcement Learning(Meta-RL)method,which can use past experience to learn potential patterns in tasks.When faced with sudden tasks such as changes in leg joints,the model can generalize to new tasks with minimal training data.Therefore,studying the application of meta-reinforcement learning in multi-legged robots is of great significance.This thesis mainly focuses on :Reinforcement learning algorithms suffer from long training time and slow convergence,as they require sufficient sample data through interaction with the environment to learn the optimal policy.In general,when RL models are trained in a simulated environment,the sampling time is long but the algorithm training time is short,and each interaction between the agent and the environment takes a long time,resulting in inefficient algorithm training.Two parallelization solutions are used to address the issue in this thesis.First,a simulation environment is built using the parallel environment Isaac Gym,which runs the full reinforcement learning sampling process on the GPU to take advantage of parallel computing and improve computational efficiency.Additionally,simulation demonstration software is designed with a tunable interface for convenient application of functions.Second,a Pybullet simulation environment is built on the parallel architecture Ray,constructing a more accurate quadruped robot model and implementing multiple engines in parallel on the CPU,storing the results in an object library to shorten training time.Simulated experiments in multiple robot tasks verified that the two parallel solutions significantly improve sampling efficiency and accelerate algorithm training.The Efficient Off-policy Meta-Reinforcement Learning via Probabilistic Context Variables(PEARL)can greatly shorten training time,separates task inference and control.However,in the task inference process,it may model irrelevant items and ignore some task specific information,leading to poor performance when new tasks are dissimilar to previous training tasks.To address this,this thesis uses contrastive learning and experience replay techniques to optimize the PEARL model and proposes the improved PEARL-BYOL algorithm.Firstly,an asymmetric structured contrastive learning model(BYOL)is introduced in PEARL.The task data is treated as positive samples to calculate contrastive loss without requiring additional negative samples in BYOL.The contrastive loss of BYOL is added to the loss function of the PEARL task inference module,enabling it to accurately reflect the characteristics of the task.Then,past data is used to relabel new tasks to improve the adaptability of the agent during the testing phase.Experimental results show that the proposed optimization algorithm performs better in new tasks,especially those dissimilar tasks,exhibiting better adaptation speed and stability.Although using meta-reinforcement learning in simulating quadruped robots can adapt well to new environments,the end-to-end control method of directly controlling the torque of each leg joint from observation states is not suitable for physical robots,with problems of high computational complexity and slow convergence speed.Therefore,this thesis uses a hierarchical control framework to address this issue.Based on the improved PEARL-BYOL algorithm,a unified policy network is trained to determine the optimal gait feature based on the current observation of the quadruped robot.Through the low-level motion controller established by the dynamic model,the expected motor torque of each leg joint is calculated to achieve robot walking control.Experimental results show that this method can dynamically adjust the robot gait in the simulation environment and exhibit good generalization ability for new tasks outside the distribution.Moreover,basic walking control of the Unitree Go1 quadruped robot has also been achieved in practice. |