Font Size: a A A

Research And Implementation Of Collaboration Strategy Generation Technology Based On Multi-Agent Deep Reinforcement Learning

Posted on:2022-09-19Degree:MasterType:Thesis
Country:ChinaCandidate:C R ZhaoFull Text:PDF
GTID:2532307169978949Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Multi-agent collaboration is an important issue in the field of artificial intelligence.Multi-agent collaboration technology can be widely used in traffic light control,autonomous vehicle coordination,resource management and other applications.It provides effective support for the implementation of these applications.At present,more and more research scholars use multi-agent deep reinforcement learning algorithms to solve the problem of multi-agent collaboration.In the field of multi-agent deep reinforcement learning,the multi-agent collaboration mainly faces two challenges:one is how to realize mutual understanding between agents in a partially observable environment,so that the agent can understand the situation on the field and make the best decision.The second is how to allocate rewards reasonably according to the collaborative contribution made by each agent in a multi-agent environment with only sparse rewards.This paper conducts research on the multi-agent collaboration in competitive scenarios.We propose the Attention-Aware Actor(Tri-A)algorithm and the Graph Value Decomposition(GVD)algorithm.We built a prototype system on this basis and carried out experimental verification.Our main contributions can be summarized as follows:(1)Aiming at the problem that existing methods are difficult to achieve mutual understanding between agents in an environment with limited communication and partial observability,we propose the Attention-Aware Actor(Tri-A)model.The model is based on the framework of Actor-Critic(AC).From the perspective of the agent itself,it uses the information of surrounding agents observed within its sight range to construct the Co Co-Graph,and reconstructs the local observation through this Co Co-Graph.Then agents can make decisions based on the reconstructed observations and generate actions with a tendency to cooperate(or attack).Since this model only works on the actor of each agent,it can be regarded as a plug-in.And it can be plugged into any multi-agent deep reinforcement learning algorithm that uses the AC framework to improve the agent’s decision-making ability.(2)Aiming at the problem of multi-agent credit assignment in the sparse reward environment,we propose the Graph Value Decomposition(GVD)algorithm.GVD uses the interactive dynamics of multi-agents during the distributed execution stage to model the relationship between agents as a bi-level graph architecture,which reveals the contribution of our agents attacking the enemy agent and the priority of our agent attacking the enemy agent at each time step.We built the graph value decomposition network based on this bi-level graph.We integrate the individual Q-value of each agent into global Q-value,and update the policy of each agent through the back-propagation of the policy gradient,which could truly realize reasonable credit distribution.(3)Based on the above research results,we designed and implemented a multi-agent collaborative prototype system based on multi-agent deep reinforcement learning algorithm,and we carried out the experimental verification of this subject in the SMAC platform.Experimental results show that the model proposed in this paper can significantly improve the collaboration performance and learning speed of multi-agent systems,compared with the existing methods.
Keywords/Search Tags:Multi-agent Collaboration, Value Decomposition, Multi-Agent Reinforcement Learning, Attention Mechanism, Graph Network
PDF Full Text Request
Related items