Font Size: a A A

Experience Replay In Multi-Agent Deep Reinforcement Learning

Posted on:2021-03-19Degree:MasterType:Thesis
Country:ChinaCandidate:Y S WangFull Text:PDF
GTID:2428330605474859Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Experience replay technology enables the agent to use experience samples for policy optimization.When only one agent interacts with the environment,experience replay technology can improve the learning efficiency of the agent.However,in the multi-agent system,if the agent uses the previous experience samples to update the current policy,it may make the agent no longer have an advantage in the subsequent game.If the agent only cares about the current game situation,it may cause the agent to overemphasize short-term gains at the expense of long-term returns.In response to these questions,this paper has studied from three aspects:how to preserve experience samples,which experience samples to use and how to use them to optimize policy.The main research contents can be summarized as:(1)This paper begins with a study of how to preserve experience samples.In the multi-agent game,due to the influence of non-stationarity,increasing the diversity of experience samples will prevent the agent from obtaining the maximum expected return,and decreasing the diversity will make the policy lose generalization ability.To ease this conflict,we propose concurrent experience replay based on the reservoir algorithm.Experimental results show that the experience replay method proposed in this paper increases the diversity of the experience sample while enhancing the generalization ability of the policy.(2)This paper then studies which experience samples are used.Some experience samples can improve the optimization quality of the policy,while others can reduce the stability of the policy.Moreover,in a multi-player game,each participant may play a different role.Therefore,for different agents,there is some variation in the use-value of the same experience sample.To alleviate the above problems,we propose concurrent experience replay based on prioritized sampling.In the new experience replay method,each agent will evaluate the importance of the experience sample based on the current game situation.Experimental results on two multi-agent tasks demonstrate the effectiveness of our proposed method.(3)At the end of this paper,how to use the experience samples to optimize policy is studied.The performance of the policy depends not only on the quality of the experience samples but also on how good the learning algorithm is.Since the Q-learning algorithm uses the same action-value function to select and evaluate actions,it will lead to overestimation.At the same time,in multi-agent reinforcement learning,overestimation will also make agents too optimistic about the future game situation.To further improve the performance of the policy,we propose a new multi-agent policy gradient algorithm by combining the double Q-learning algorithm with the multi-agent policy gradient while using more advanced experience replay technology.Experimental results show that our proposed algorithm gives the agent greater advantages in the game and enhances the robustness of the policy.
Keywords/Search Tags:Reinforcement Learning, Multi-Agent, Deep Learning, Experience Selection
PDF Full Text Request
Related items