Font Size: a A A

Research On Multi-Agent Combat Based On Value Decomposition Deep Reinforcement Learning

Posted on:2024-08-13Degree:MasterType:Thesis
Country:ChinaCandidate:D Q JinFull Text:PDF
GTID:2568306941990949Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Multi-agent systems are ubiquitous in real life and can be widely used in fields such as multi-robot control,intelligent transportation,and military combat.With the arrival of the third wave of artificial intelligence,the trend of multi-agent competition is becoming more prominent.Due to the strong advantages of value decomposition deep reinforcement learning in decision-making,the QMIX-HA algorithm based on hypergraph and attention mechanism has become the main method for solving multi-agent competition problems.In the multiagent competition scenario,communication difficulties between agents lead to credit allocation problems.As the number of agents increases,the state space also increases,leading to low utilization of state information and increased exploration difficulty.These problems affect the implementation of decision-making in multi-agent competition tasks.This article studies relevant issues in the multi-agent competition scenario and completes the following work:We propose the QMIX-HA algorithm based on hypergraph and attention mechanism.First,facing the credit allocation problem caused by lack of communication and cooperation between agents,we introduce the hypergraph structure and use the hidden layer state of individual agent networks to construct hypergraphs,retaining observation information of agents and inputting action value functions of agents for hypergraph convolution operation,and ultimately obtaining the revised action value function,which promotes communication and cooperation.Secondly,in order to effectively utilize global state information,we introduce a reward query attention mechanism layer,using the reward as the query value to extract global state information that is more important for the current task,thereby improving the convergence speed and learning efficiency of the algorithm.Finally,we conduct comparative experiments and ablation experiments on the Star Craft II micro-management multi-agent competition simulation platform,and the experimental results verify the effectiveness of the algorithm.In response to the insufficient exploration strategy in the multi-agent competition scenario,we propose the SCE exploration method driven by strangeness and curiosity,and apply it to the QMIX-HA algorithm proposed in Chapter 3.First,we use network reconstruction to observe the value and use the reconstruction error as the exploration reward.Secondly,when the training reaches a certain round,we use the prediction network to predict the state value function and use the error as the exploration reward to encourage exploration.Finally,we verify the SCE exploration method on the simulation platform,and the experiments show that the method can effectively improve the performance of the algorithm.
Keywords/Search Tags:Multi-agent system, Confrontation, Value decomposition reinforcement learning, Credit allocation, Exploration method
PDF Full Text Request
Related items