Font Size: a A A

Multi-Agent Cooperative Strategy Based On Reinforcement Learning Research And Application

Posted on:2024-09-06Degree:MasterType:Thesis
Country:ChinaCandidate:X Z ChenFull Text:PDF
GTID:2568307061968779Subject:Electronic information
Abstract/Summary:PDF Full Text Request
As the domain of artificial intelligence continues to progress,deep reinforcement learning has achieved notable accomplishments in the realm of single agents.Nevertheless,when implemented in multi-agent scenarios,it inevitably encounters a multitude of novel predicaments,including environmental instability,inadequate inter-agent communication efficacy,and the challenge of appropriately allocating rewards.These problems will seriously affect the efficiency of cooperation between agents.Therefore,in multi-agent settings,the ability for agents to work in unison to accomplish designated objectives holds significant practical significance.To solve the above problems based on the reinforcement learning method,this paper conducts research in the multi-agent cooperative environment.The main work is as follows:1.A multi-agent reinforcement learning algorithm based on cyclic neural network is proposed to solve the local observability problem in multi-agent cooperative environment.This algorithm uses bidirectional cyclic neural network to execute Actor network.Through the past environment observation information and agent action information saved in the network,it can increase the information that the agent can refer to when making decision as much as possible,so as to improve the effectiveness of its strategy and reduce the influence brought by local observation.In addition,the differential reward distribution mechanism is added to clarify the contribution degree of each agent to the completion of the task,and encourage the agent to choose more appropriate action output,so that it can train the correct behavior strategy.A comparison experiment is carried out between the simulated cooperative task environment and the passive localization task environment.Experimental results show that the proposed method can improve the performance of the algorithm more effectively when the task environment is complex.2.Aiming at the problem of credit allocation in multi-agent environment,a multi-agent reinforcement learning algorithm based on value decomposition is proposed.In this algorithm,a centralized Critic network of value decomposition is adopted to calculate the strategy gradient,and then the strategy network is updated according to it.By utilizing the Critic network in this configuration,it becomes feasible to evaluate the individual contribution of each agent towards the overall system reward,minimize the impact of dimension explosion,and enhance the training efficiency of the algorithm.To further validate the effectiveness of the proposed approach,a comparative experiment is conducted in a simulated task environment,yielding experimental results that demonstrate its ability to enhance task completion rates and training efficiency.3.Aiming at the problems existing in the current mainstream multi-agent reinforcement learning and training framework "centralized training distributed execution" : in the training stage,training is conducted according to the observation data of all agents to generate strategies,but in the execution stage,each agent can only obtain its local observation,resulting in poor performance of the algorithm.Especially in the collaborative task environment,this problem is more prominent.Therefore,a communication mechanism based on shared experience is proposed.By opening up a certain size of storage space,it can be used as a shared experience pool among multiple agents.In the training and execution stage,the agent is allowed to carry out parallel read and write operations based on explicit communication,so that the agent can infer the overall task environment,and improve the efficiency of cooperation between agents.Finally,the superiority of this method is proved by comparison in the simulation task environment.
Keywords/Search Tags:multi-agent, Reinforcement learning, Cooperative control, Cyclic neural network, Value function decomposition
PDF Full Text Request
Related items