Research On Policy-making Method Of Wargame Deduction Based On Multi-agent Reinforcement Learning

Posted on:2023-02-04

Degree:Master

Type:Thesis

Country:China

Candidate:Z Y Ma

Full Text:PDF

GTID:2532306836472314

Subject:Electronic and communication engineering

Abstract/Summary:

PDF Full Text Request

Wargame deduction is an effective method to simulate the actual war,it is an important tool to study the military action,also the key to the future war.While using “rules” for policy-making lacks the ability to adapt to different maps and different opponents,using reinforcement learning method can perform autonomous learning and make more intelligent policy in the deduction progress,but currently the multi-agent field characteristics limit its application in wargame deduction.Based on reinforcement learning method,this thesis introduces value decomposition method and hierarchical policy-making method respectively,intending to solve the issue of credit assignment and sparse reward.This thesis mainly does the following two aspects:Although the multi-agent policy gradient method has achieved great success,the credit assignment problem has not been effectively solved because it usually uses the total state for evaluation under the framework of centralized training.At the same time,the dynamic multi-agent environment and the high-dimensional input of neural network also bring challenges to the reliability of centralized critic network.In order to solve the above problems,this thesis considers the decomposition of the centralized evaluation method,which is inducing the idea of value decomposition to the multi-agent Actor-Critic algorithm.This thesis also changes the previous statevalue evaluation to action-value evaluation,proposing an action-value decomposed multi-agent method.This method first evaluates the local observation of each agent in the local layer network to obtain the local values,and then integrates them into the total value in the mixing layer for learning.In the mixing process,a two-stage attention mechanism is used to measure the contribution of each agent to the total value,this method can better estimate the total value and measure the local impact.Based on the same value decomposition framework,this thesis also proposes another state-value decomposed multi-agent method,which is also committed to alleviating the problem of credit assignment.In order to verify the effectiveness of the above algorithms,experiments are carried out in the micro-environment of Star Craft II.The experimental results show that the above methods are better than other reinforcement learning algorithms.Sparse reward problem is common in multi-agent environment because only the complex actions of agents can be rewarded,and some environments are difficult to provide positive rewards.At the same time,it is also the difficulty of how to perform actual modeling based on the original state and actions of the wargame platform.This thesis proposes a hierarchical policy-making multi-agent method.This method uses the manager network based on semi-Markov decision process to set goals,then guides the agents under the worker network to learn through goals.In this thesis,hypernetwork method is used to reasonably embed the goal information of manager network into the evaluation process of worker network,so as to strengthen the communication between the two layers.This thesis implements the design of state space,double-layer action space and reward function for the wargame deduction platform.At the same time,in order to strengthen the universality of the algorithm,this thesis proposes another modeling-free policy-making method by using anomaly detection and clustering algorithm,which can automatically obtain the target.In this thesis,experiments are carried out on the "Miaosuan" landwar platform and gridworld platform.The experimental results verify that this method can make complex policy and has strong universality.

Keywords/Search Tags:

Wargame deduction, Multi-agent reinforcement learning, Value decomposed policy gradient, Hierachical policy-making method

PDF Full Text Request

Related items

1	Research On Decision-making Method Of Highway Autonomous Driving Based On Reinforcement Learning
2	Research On Unmanned Vehicle Control Method Based On Policy Gradient Reinforcement Learning
3	Design And Implementation Of Attack And Defense Decision-making And Evaluation System Based On A Multi-agent Wargaming Deduction
4	Research On Air Combat Situation Analysis And Decision Making Based On Deep Reinforcement Learning
5	Research On Deep Reinforcement Learning Algorithm For Intelligent Military Decision
6	Research On Confronting Policy Generation Method Of Multi-Agent System Based On Reinforcement Learning
7	Research Of Unmanned Driving Policy Based On Aggregated Multiple Deep Deterministic Policy Gradient
8	Research On Driverless Control Policy Based On Deep Reinforcement Learning
9	Research On Behavior Decision Making Of Self-driving Vehicle Based On Meta Reinforcement Learning
10	Research On Traffic Signal Control Method Based On Deep Reinforcement Learning