Research On Multi-Agent Combat Based On Value Decomposition Deep Reinforcement Learning

Posted on:2024-08-13

Degree:Master

Type:Thesis

Country:China

Candidate:D Q Jin

Full Text:PDF

GTID:2568306941990949

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

Multi-agent systems are ubiquitous in real life and can be widely used in fields such as multi-robot control,intelligent transportation,and military combat.With the arrival of the third wave of artificial intelligence,the trend of multi-agent competition is becoming more prominent.Due to the strong advantages of value decomposition deep reinforcement learning in decision-making,the QMIX-HA algorithm based on hypergraph and attention mechanism has become the main method for solving multi-agent competition problems.In the multiagent competition scenario,communication difficulties between agents lead to credit allocation problems.As the number of agents increases,the state space also increases,leading to low utilization of state information and increased exploration difficulty.These problems affect the implementation of decision-making in multi-agent competition tasks.This article studies relevant issues in the multi-agent competition scenario and completes the following work:We propose the QMIX-HA algorithm based on hypergraph and attention mechanism.First,facing the credit allocation problem caused by lack of communication and cooperation between agents,we introduce the hypergraph structure and use the hidden layer state of individual agent networks to construct hypergraphs,retaining observation information of agents and inputting action value functions of agents for hypergraph convolution operation,and ultimately obtaining the revised action value function,which promotes communication and cooperation.Secondly,in order to effectively utilize global state information,we introduce a reward query attention mechanism layer,using the reward as the query value to extract global state information that is more important for the current task,thereby improving the convergence speed and learning efficiency of the algorithm.Finally,we conduct comparative experiments and ablation experiments on the Star Craft II micro-management multi-agent competition simulation platform,and the experimental results verify the effectiveness of the algorithm.In response to the insufficient exploration strategy in the multi-agent competition scenario,we propose the SCE exploration method driven by strangeness and curiosity,and apply it to the QMIX-HA algorithm proposed in Chapter 3.First,we use network reconstruction to observe the value and use the reconstruction error as the exploration reward.Secondly,when the training reaches a certain round,we use the prediction network to predict the state value function and use the error as the exploration reward to encourage exploration.Finally,we verify the SCE exploration method on the simulation platform,and the experiments show that the method can effectively improve the performance of the algorithm.

Keywords/Search Tags:

Multi-agent system, Confrontation, Value decomposition reinforcement learning, Credit allocation, Exploration method

PDF Full Text Request

Related items

1	Research On Multi-agent Cooperative Confrontation Method Based On Deep Reinforcement Learning
2	Multi-agent Value Decomposition Method With Importance Weighted Feedback
3	Research On Multi-agent Confrontation Algorithm Based On Deep Reinforcement Learning
4	Multi-agent Confrontation Algorithm Based On Reinforcement Learning
5	A Study Of Multi-agent Reinforcement Learning Based On Weighted Q-value Decomposition
6	Reinforcement Learning Technology Optimization For Heterogeneous Multi-agent Game Confrontation
7	Wireless Resource Allocation For Multi-cell OFDM Communication System With Distributed Multi-agent Machine Learning Method
8	Multi-agent Transfer Reinforcement Learning With Efficient Exploration
9	Research On Knowledge Sharing And Exploration Mechanism In Multi-agent Reinforcement Learning
10	Research On Group Confrontation Strategies Based On Deep Reinforcement Learning