Multi-agent Value Decomposition Method With Importance Weighted Feedback

Posted on:2023-11-22

Degree:Master

Type:Thesis

Country:China

Candidate:B Y Yu

Full Text:PDF

GTID:2568306758480284

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

In recent years,the research on multi-agent reinforcement learning has attracted much attention,and the research on the value decomposition problem has attracted extensive attention of researchers.In the multi-agent value decomposition method,the value function of the whole environment can be expressed as the combination of each agent’s value function to improve the performance of the whole strategy.However,there are two main problems with current value decomposition methods:(1)The learning efficiency of value decomposition methods is low,and the learning efficiency is an important performance indicator of the algorithm.Improving the learning efficiency of the algorithm is of great significance;(2)The value decomposition methods have the problem of insufficient exploration,and the exploration ability is very important to the multi-agent reinforcement learning algorithm.Improving the exploration ability can avoid the agent’s strategy falling into the local optimization to obtain the multi-agent strategy with better performance.Based on the above problems,the solutions proposed in this paper are as follows:(1)This paper proposes a method of accelerating convergence mechanism based on the importance weighted feedback: WF-QMIX(Weighted Feedback-QMIX).The algorithm improves the learning efficiency of the value decomposition method by introducing a new set of action-value functions.Firstly,the algorithm introduces an importance weight parameter network to give a group of action-value importance weights to the agent’s strategy.Secondly,the selection gate structure is introduced into the algorithm.When the overall value of the action value combination given importance weight is closer to the target value through the mixing network,the algorithm reduces the difference between the original action-value combination and the action-value combination given importance weight,and the algorithm further updates the model to speed up its learning convergence speed;On the contrary,it will increase the difference between the two groups of action-value combination to improve the ability of exploration.The experimental results show that the learning convergence speed and final performance of the WF-QMIX method are better than other comparative algorithms.(2)This paper proposes an extended exploration mechanism method with variational exploration: WFVAE(Weighted-Feedback QMIX with Variational Exploration).The algorithm adjusts the dynamic strategy of the interaction between agent and environment by introducing a behavior pattern implicit variable to solve the problem of insufficient exploration in value decomposition methods.Firstly,the algorithm introduces the implicit variable of behavior pattern and associates it with the dynamic strategy of agents.Secondly,the algorithm realizes the dynamic adjustment of the interaction strategy between agent and environment by changing the implicit variables of behavior mode to expand the exploration space of the method and further improve the exploration ability.Experimental results show that the performance of the WFVAE method is better than other comparative algorithms.

Keywords/Search Tags:

Reinforcement learning, Value decomposition problem, Exploration, Deep learning, Multi-agent system

PDF Full Text Request

Related items

1	Research On Multi-Agent Combat Based On Value Decomposition Deep Reinforcement Learning
2	A Study Of Multi-agent Reinforcement Learning Based On Weighted Q-value Decomposition
3	Research On Multi-Agent Collaboration Methods Based On Deep Reinforcement Learning
4	Research On Multi-agent Deep Reinforcement Learning In Non-globally Knowable Environment
5	Research On Deep Learning Technology Of Unmanned System Supporting Group Motion Planning
6	Multi-agent Transfer Reinforcement Learning With Efficient Exploration
7	Research On Deep Reinforcement Learning Technology For Multi-agent Collaboration
8	Research On Multi-agent Attack And Defense Countermeasures Based On Deep Reinforcement Learning
9	Research On Multi-agent System Decision Algorithm Based On Deep Reinforcement Learning
10	Research On The Key Technology Of Multi-agent Collaborative Algorithm Based On Deep Reinforcement Learning