Font Size: a A A

Research On Policy-making Method Of Wargame Deduction Based On Multi-agent Reinforcement Learning

Posted on:2023-02-04Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y MaFull Text:PDF
GTID:2532306836472314Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
Wargame deduction is an effective method to simulate the actual war,it is an important tool to study the military action,also the key to the future war.While using “rules” for policy-making lacks the ability to adapt to different maps and different opponents,using reinforcement learning method can perform autonomous learning and make more intelligent policy in the deduction progress,but currently the multi-agent field characteristics limit its application in wargame deduction.Based on reinforcement learning method,this thesis introduces value decomposition method and hierarchical policy-making method respectively,intending to solve the issue of credit assignment and sparse reward.This thesis mainly does the following two aspects:Although the multi-agent policy gradient method has achieved great success,the credit assignment problem has not been effectively solved because it usually uses the total state for evaluation under the framework of centralized training.At the same time,the dynamic multi-agent environment and the high-dimensional input of neural network also bring challenges to the reliability of centralized critic network.In order to solve the above problems,this thesis considers the decomposition of the centralized evaluation method,which is inducing the idea of value decomposition to the multi-agent Actor-Critic algorithm.This thesis also changes the previous statevalue evaluation to action-value evaluation,proposing an action-value decomposed multi-agent method.This method first evaluates the local observation of each agent in the local layer network to obtain the local values,and then integrates them into the total value in the mixing layer for learning.In the mixing process,a two-stage attention mechanism is used to measure the contribution of each agent to the total value,this method can better estimate the total value and measure the local impact.Based on the same value decomposition framework,this thesis also proposes another state-value decomposed multi-agent method,which is also committed to alleviating the problem of credit assignment.In order to verify the effectiveness of the above algorithms,experiments are carried out in the micro-environment of Star Craft II.The experimental results show that the above methods are better than other reinforcement learning algorithms.Sparse reward problem is common in multi-agent environment because only the complex actions of agents can be rewarded,and some environments are difficult to provide positive rewards.At the same time,it is also the difficulty of how to perform actual modeling based on the original state and actions of the wargame platform.This thesis proposes a hierarchical policy-making multi-agent method.This method uses the manager network based on semi-Markov decision process to set goals,then guides the agents under the worker network to learn through goals.In this thesis,hypernetwork method is used to reasonably embed the goal information of manager network into the evaluation process of worker network,so as to strengthen the communication between the two layers.This thesis implements the design of state space,double-layer action space and reward function for the wargame deduction platform.At the same time,in order to strengthen the universality of the algorithm,this thesis proposes another modeling-free policy-making method by using anomaly detection and clustering algorithm,which can automatically obtain the target.In this thesis,experiments are carried out on the "Miaosuan" landwar platform and gridworld platform.The experimental results verify that this method can make complex policy and has strong universality.
Keywords/Search Tags:Wargame deduction, Multi-agent reinforcement learning, Value decomposed policy gradient, Hierachical policy-making method
PDF Full Text Request
Related items