Font Size: a A A

Research On Meta-Learning Methods Towards High Sample-Efficiency Reinforcement Learning

Posted on:2022-02-08Degree:MasterType:Thesis
Country:ChinaCandidate:J T HuangFull Text:PDF
GTID:2558307169481294Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Since Deep Mind proposed a DQN algorithm to introduce deep learning into the field of reinforcement learning in 2013,deep reinforcement learning has achieved impressive success in single-player games through the combination of search,planning,and other technologies.The success in competitive games and the combination with multi-agent systems have formed a new multi-agent deep reinforcement learning field.Although deep reinforcement learning has achieved remarkable results both in single-agent and multi-agent environments,it also has the problem of extremely low sample efficiency.The problem has greatly limited the application scenario of reinforcement learning.Further improving the sample efficiency of deep reinforcement learning has become a critical problem in promoting the application of deep reinforcement learning algorithms.In recent years,meta-learning has become topical as a paradigm to improve model performance because of its successful application in the fields of few-shot learning,unsupervised learning,and reinforcement learning.As a method of ”learning how to learn”,meta-learning improves the performance of the future task by learning from experience.It also provides an alternative solution to solve the problem of sample efficiency in deep reinforcement learning.In this paper,aiming at the problem of extremely low sample efficiency of deep reinforcement learning,based on summarizing the current work,we propose a meta actor-critic framework to solve the problem of learning cost in the current meta loss framework.We formalize the optimization problem of the meta actor-critic framework into a bi-level optimization problem.Based on our meta actor-critic framework,we proposed an application method in single-agent and multi-agent environments.In the single-agent scenario,we propose a meta attention method by combining the meta-learning method with the attention mechanism.In this method,we acquire a meta attention network through meta-learning.Unlike the previous meta-learning methods,our meta attention method can not only play a role in the training stage,but it can also participate in the agent’s decision-making process at any time step.Compared with existing methods,our meta attention method eliminates the limitation that the attention mechanism can only deal with multisource information or image information in reinforcement learning and realize the attention matching between agent’s actor and critic.Through the experiment,we prove the superiority of the meta attention method.In the multi-agent environment,we use the meta loss generation method proposed in the meta actor-critic framework and extend the existing meta loss method in the single-agent environment to the multi-agent field.In addition to comparing the performance of our method with the existing methods in the multi-agent environment,we also try to combine the two to achieve double improvement in short-term and long-term performance at the same time.Experiments show that the meta actor-critic framework in the multiagent environment can accelerate the learning speed and promote a higher level of cooperation among agents.
Keywords/Search Tags:Reinforcement Learning, Multi-agent System, Meta-learning, Attention Mechanism, Bi-level Optimization
PDF Full Text Request
Related items