Font Size: a A A

Research On Robot Motion Control Based On Meta Reinforcement Learning

Posted on:2020-07-05Degree:MasterType:Thesis
Country:ChinaCandidate:T HaoFull Text:PDF
GTID:2428330623959097Subject:Engineering
Abstract/Summary:PDF Full Text Request
Intelligent robots are machines with the ability to perceive,think and act,they can learn like human beings.When they faced with new problems,they can use the acquired knowledge to quickly deal with new problems,so it can help them to adapt to the changing environment.Robot motion control is a typical sequence decision problem: the robot must make a coherent action response after observing the environmental state immediately.Reinforcement learning is an important branch of machine learning that can be used to solve sequence decision problems.In recent years,a large number of studies have applied deep reinforcement learning algorithms to deal with robot motion,such as Actor-Critic algorithm based on strategy learning,Deep Deterministic Policy Gradient(DDPG)algorithm and Asynchronous Advantage ActorCritic(A3C)algorithm and so on.These algorithms have achieved good results in solving a single task,but these algorithms require a lot of time to learn.At the same time,the agents trained by these algorithms are short of generalization ability,and there is a problem called Catastrophic Forgetting which can not take advantage of the experiences learned in previous missions and cause robots unable to adapt quickly to changing environments.The goal of Meta Learning is to learn the commonality and specificity between tasks through sparse samples,thereby improving the ability of the agent to adapt quickly.The addition of meta-learning on the basis of reinforcement learning enables the reinforcement learning agent to have the ability to quickly adapt to various tasks in the environment through a small amount of learning materials,so as to achieve true human intelligence.In this paper,based on the MAML(Model-Agnostic Meta-Learning)framework,and TRPO(Trust region policy optimization),a gradient optimization algorithm,are applied to compute twice gradients in each round of training to enable the agent to acquire fast learning and adapt to new tasks.(1)In the framework of MAML algorithm,research base on changes of the Proximal Policy Optimization(PPO)algorithm which is one of a deep reinforcement learning algorithm,adding a label network to the original policy gradient algorithm to optimize the action selection.(2)Adding the advantage function to increase the amplitude of high-quality actions when the Agent interacts with the environment in the future and improve the learning ability of the agent.(3)At the same time,it is also studied to accelerate the adaptation speed of the agent by adding the external environment context and OU action noise on the basis of MAML.(4)The algorithm was simulated in the Mujoco simulation environment,using half cheetahs(HalfCheetah)and 3D ants(Ant)in the benchmark environment.The agent needs to control half cheetahs and ants to run at the specified speed and and run in the specified direction.The way of obtaining the tasks is a random uniform distribution and a random binomial distribution.The simulation results show that the proposed algorithm can improve the adaptability of the agent to new tasks.
Keywords/Search Tags:reinforcement learning, meta-learning, robot motion control, rapid adaptation
PDF Full Text Request
Related items