Font Size: a A A

Research On Confronting Policy Generation Method Of Multi-Agent System Based On Reinforcement Learning

Posted on:2023-09-28Degree:MasterType:Thesis
Country:ChinaCandidate:L Y NiuFull Text:PDF
GTID:2532307169979069Subject:Army commanding learn
Abstract/Summary:PDF Full Text Request
Unmanned platform is gradually replacing human beings,becoming the foreground of future wars and producing new quality combat effectiveness.The capability expansion of multi-unmanned platforms through the cooperation of unmanned platforms is becoming the key path to multiply unmanned combat capability,which has attracted the attention of all countries in the world.In order to study the coordination scheme of multiunmanned platforms,this thesis simplifies the factors such as strategic deception and situation acquisition error in actual warfare.Through modeling unmanned platforms as Multi-agent System(MAS),the paper mainly study the content of MAS used in the confrontation,and then combined artificial intelligence technology and MAS to enhance the coordination ability of the MAS in the confrontational scene.As an indispensable part of the process of multi-agent system confrontation,confronting policy generation has great influence on the coordination effect of MAS.In the context of above problem,aiming at the problem of the dynamic number and type of agents,based on Reinforcement Learning,Meta-Reinforcement Learning,Transfer Learning and other methods,this thesis focuses on the multi-agent reinforcement learning method with adaptability and universality in dynamic situation.The main contents of this thesis are as follows:(1)A framework of MAS confrontation policy generation based on Reinforcement Learning.The structure and organization of MAS in the confrontation scene are analyzed,and the confrontation scene of MAS is designed.The structure of MAS used in the confrontation is analyzed.By analyzing the challenge of multi-agent policy generation method used in the confrontation,The requirements of MAS confrontation policy based on reinforcement learning are proposed.(2)Multi-agent policy transfer method based on replay buffer is designed.In order to improve the universality of multi-agent reinforcement learning algorithm in dynamic situations and accelerate the learning speed of algorithm facing new situations,a multi-agent policy transfer method is designed.Firstly,the dynamic situations are modeled as multiple simple situations arranged in descending chronological order,and the knowledge in the previous situation is used to accelerate the training in the new situation.In each simple situation,the multi-agent reinforcement learning method takes the situation sample data saved by the experience replay buffer as the situation knowledge.And the state transition model is used as the situation identifier to measure the difference between the two situations,so as to judge whether the previous situation knowledge is suitable for transfer.Finally,experiments show that the policy transfer method can accelerate algorithm training in new situations.(3)Multi-task multi-agent continuous and self-adaptive learning method based on meta-reinforcement learning is proposed.In order to enhance the adaptability of multi-agent policy in different situations,this paper proposes a multi-agent reinforcement learning method for multi-tasks.First,the method uses the same Deep RNN Q Network to learn the well-perform policy in each task.Secondly,in the stage of training unified model,samples are obtained through the interaction between single-task policy and environment.Then the meta-reinforcement learning training method is used to learn the unified model to maximize knowledge transfer and minimize the interference of transfer.Finally,to verify the effectiveness of the proposed method,experiments are carried out on the meta-SMAC platform.Results show the universality of the proposed method in different situations.(4)The Meta-SMAC simulation environment for multi-task learning is developed to verify the performance of algorithms.In order to verify the proposed method,this paper extends the widely used multiagent experimental environment SMAC based on the characteristics of the research problem.A new simulation environment for multi-task learning,meta-SMAC,is developed.The proposed algorithm is compared with REPTILE and FOMAML in this simulation environment to test the adaptability and universality of the algorithm in dynamic situations.
Keywords/Search Tags:Multi-Agent System, Multi-Agent Reinforcement Learning, Transfer Learning, Meta Learning, Knowledge Transfer
PDF Full Text Request
Related items