Font Size: a A A

Research On Multi-agent Confrontation Algorithm Based On Deep Reinforcement Learning

Posted on:2024-09-26Degree:MasterType:Thesis
Country:ChinaCandidate:J T XueFull Text:PDF
GTID:2558307079469904Subject:Electronic information
Abstract/Summary:PDF Full Text Request
With the complexity of decision-making tasks in the real world,multi-agent confrontation scenarios widely appear in various fields,such as robot combat,chess games,military decision-making,stock trading,and game AI competition.Representative algorithms for multi-agent confrontation scenarios have gradually become a hot area of concern for researchers.However,multi-agent deep reinforcement learning still faces many challenges when applied to adversarial scenarios,including large estimation deviations of multi-agent action state functions,complex action space of agents,low utilization of training samples,unreasonable credit allocation,etc.challenge.How to optimize related problems in a reasonable way has become a popular research direction for multi-agent deep reinforcement learning in confrontation scenarios.In view of the two problems of valuation bias and credit allocation between agents in the action value function of multi-agents,this paper combines the existing deep reinforcement learning and graph neural network algorithms to carry out specific research in multi-agent confrontation scenarios.In this paper,the SMAC simulation experiment environment is used as the algorithm test scenario in the field of multi-agent confrontation,and the typical Qmix algorithm in the field of multi-agent confrontation is selected as the baseline research algorithm.The main work is as follows:(1)For the problem of valuation bias in the action value function of multi-agents,this paper proposes an improved ADP-mix algorithm.On the basis of the network structure of the traditional Qmix algorithm,this paper combines the idea of abstract dynamic programming,uses value iteration to improve the strategy training network DRQN of a single agent,and modifies its final loss function,so as to promote The collaborative cooperation among them can improve the final training effect of the algorithm.Experiments carried out on the SMAC simulation experiment platform show that,on the basis of the Qmix baseline algorithm,the ADP-mix algorithm proposed in this paper can effectively optimize the problem of multi-intelligence action value functions with valuation deviations.In many simulation scenarios,the algorithm The final winning rate and return have better performance,and the algorithm performance has been improved.(2)For the problem of credit allocation among agents,this paper proposes an improved AG-mix algorithm based on ADP-mix.In this paper,the mixed network of ADP-mix is improved by combining the graph neural network.Firstly,the utilization rate of the algorithm to the Q function and the fitting ability of the joint value function are enhanced through the GINE graph neural network,and combined with self-attention The mechanism calculates the contribution of each agent to the joint value,so as to optimize the credit allocation problem among multiple agents.Finally,experiments on the SMAC simulation experiment platform show that the AG-mix algorithm proposed in this paper can effectively optimize the credit allocation problem existing in the original algorithm.In many simulation scenarios,the final winning rate and return of the algorithm are better,and the performance of the algorithm is further improved.(3)For many problems in the actual use of the basic version of the SMAC simulation experiment platform,such as high platform use time cost,imperfect visualization means,and poor data confidentiality,this paper designs an auxiliary SMAC simulation experiment platform.On the basis of the original platform,functional modules such as experimental configuration management,algorithm performance visualization,and user data encryption have been added to provide researchers with more stable and efficient algorithm performance testing services.
Keywords/Search Tags:Multi-Agent Confrontation Algorithm, Deep Reinforcement Learning, Abstract Dynamic Programming, Graph neural network
PDF Full Text Request
Related items