Font Size: a A A

Multi-agent Reinforcement Learning Algorithm Evaluation And Explainability Research

Posted on:2024-03-16Degree:MasterType:Thesis
Country:ChinaCandidate:S P LuFull Text:PDF
GTID:2568307103473414Subject:Cyberspace security
Abstract/Summary:PDF Full Text Request
In recent years,with the combination of deep technology and reinforcement learning,multi-agent reinforcement learning(MARL)algorithms have made remarkable progress in building cooperative artificial intelligence.The MARL algorithm has become a key way to solve the multi-agent system collaboration problem,and is widely used in the fields of intelligent driving and unmanned aerial vehicles.Although there are already some MARL algorithms based on value decomposition or centralized value functions aim to improve algorithm performance and scalability,the training cost and data efficiency of the Marl algorithm are rarely discussed.In addition,the opacity and unexplainability of the MARL algorithm also restrict the application of multi-agent systems in industry and daily life.In order to solve the above problems,this thesis conducts related research on the evaluation and explainability of the collaborative MARL algorithm.The main contributions are as follows:(1)Research on MARL algorithm evaluation.In actual distributed deployment,algorithm cost and data utilization must be considered.First,in order to evaluate the cost of the algorithm,this thesis proposes the number of floating-point operations per second,the amount of neural network parameters and the amount of communication as indicators to measure the training cost of the MARL algorithm.Secondly,in order to describe the efficiency of the MARL algorithm in terms of data utilization,this thesis proposes the area under the training curve(AUC)and the highest winning rate(HWR)indicators during the training process from a dynamic and static point of view.Finally,this thesis evaluates the performance of 13 MARL algorithms on 23 Star Craft MultiAgent Challenge(SMAC)maps with 8 million time steps as the training standard.The final experimental results show that in simple tasks,most algorithms perform well,and it is more suitable to choose an algorithm with lower cost;in moderately difficult tasks,algorithms with complex neural network structures perform better,but the cost is higher;for very For difficult tasks,most algorithms perform poorly,so it is recommended to use algorithms with simple neural network structures to save training costs.(2)Research on the MARL algorithm explanatory problem.In order to help researchers understand the behavior of algorithms,this thesis uses visual interpretation techniques that can be explained afterwards,and constructs a MARL algorithm visualization system to reveal the training details to help understand the algorithm model through the Statistical view,Epoch view and Episode view.First,based on view analysis,through the changes of event sequence data(such as state,action,Q value,etc.)The overall strategy learning process.Secondly,based on the visualization data,this thesis found that the distribution of the agent’s life value in the failed game was unbalanced,so an improved scheme of QMIX-Gini was proposed,which introduced the Gini coefficient in the reward function as a regularization term to guide the agent Learn collaborative strategies that spread the damage.The final experimental results show that QMIX-Gini surpassed the QMIX algorithm in all 22 SMAC maps based on AUC,HWR,the average Gini coefficient,and the average number of survivors.
Keywords/Search Tags:MARL, Algorithm evaluation, Visualization technology, Explainability
PDF Full Text Request
Related items