Research On Multi-agent Confrontation Algorithm Based On Deep Reinforcement Learning

Posted on:2024-09-26

Degree:Master

Type:Thesis

Country:China

Candidate:J T Xue

Full Text:PDF

GTID:2558307079469904

Subject:Electronic information

Abstract/Summary:

PDF Full Text Request

With the complexity of decision-making tasks in the real world,multi-agent confrontation scenarios widely appear in various fields,such as robot combat,chess games,military decision-making,stock trading,and game AI competition.Representative algorithms for multi-agent confrontation scenarios have gradually become a hot area of concern for researchers.However,multi-agent deep reinforcement learning still faces many challenges when applied to adversarial scenarios,including large estimation deviations of multi-agent action state functions,complex action space of agents,low utilization of training samples,unreasonable credit allocation,etc.challenge.How to optimize related problems in a reasonable way has become a popular research direction for multi-agent deep reinforcement learning in confrontation scenarios.In view of the two problems of valuation bias and credit allocation between agents in the action value function of multi-agents,this paper combines the existing deep reinforcement learning and graph neural network algorithms to carry out specific research in multi-agent confrontation scenarios.In this paper,the SMAC simulation experiment environment is used as the algorithm test scenario in the field of multi-agent confrontation,and the typical Qmix algorithm in the field of multi-agent confrontation is selected as the baseline research algorithm.The main work is as follows:(1)For the problem of valuation bias in the action value function of multi-agents,this paper proposes an improved ADP-mix algorithm.On the basis of the network structure of the traditional Qmix algorithm,this paper combines the idea of abstract dynamic programming,uses value iteration to improve the strategy training network DRQN of a single agent,and modifies its final loss function,so as to promote The collaborative cooperation among them can improve the final training effect of the algorithm.Experiments carried out on the SMAC simulation experiment platform show that,on the basis of the Qmix baseline algorithm,the ADP-mix algorithm proposed in this paper can effectively optimize the problem of multi-intelligence action value functions with valuation deviations.In many simulation scenarios,the algorithm The final winning rate and return have better performance,and the algorithm performance has been improved.(2)For the problem of credit allocation among agents,this paper proposes an improved AG-mix algorithm based on ADP-mix.In this paper,the mixed network of ADP-mix is improved by combining the graph neural network.Firstly,the utilization rate of the algorithm to the Q function and the fitting ability of the joint value function are enhanced through the GINE graph neural network,and combined with self-attention The mechanism calculates the contribution of each agent to the joint value,so as to optimize the credit allocation problem among multiple agents.Finally,experiments on the SMAC simulation experiment platform show that the AG-mix algorithm proposed in this paper can effectively optimize the credit allocation problem existing in the original algorithm.In many simulation scenarios,the final winning rate and return of the algorithm are better,and the performance of the algorithm is further improved.(3)For many problems in the actual use of the basic version of the SMAC simulation experiment platform,such as high platform use time cost,imperfect visualization means,and poor data confidentiality,this paper designs an auxiliary SMAC simulation experiment platform.On the basis of the original platform,functional modules such as experimental configuration management,algorithm performance visualization,and user data encryption have been added to provide researchers with more stable and efficient algorithm performance testing services.

Keywords/Search Tags:

Multi-Agent Confrontation Algorithm, Deep Reinforcement Learning, Abstract Dynamic Programming, Graph neural network

PDF Full Text Request

Related items

1	Research On Multi-agent Confrontation Strategy Based On Deep Reinforcement Learning
2	Research On Multi-agent Cooperative Confrontation Method Based On Deep Reinforcement Learning
3	Collaborative Confrontation Algorithm Based On Deep Reinforcement Learning
4	Research On Group Confrontation Strategies Based On Deep Reinforcement Learning
5	Research On Multi-Agent Pursuit-Evasion Based On Deep Reinforcement Learning
6	Research On Antagonistic Strategies Based On Deep Reinforcement Learning
7	Research On Cooperative Confrontation Of Multiple Agents Based On Deep Reinforcement Learning
8	Research On Multi-Agent Combat Based On Value Decomposition Deep Reinforcement Learning
9	Multi-agent Confrontation Algorithm Based On Reinforcement Learning
10	Research Of Multi-agent Cooperation Mechanism Based On Reinforcement Learning