| The increasing advancement in science and technology has led to the widespread use of the multi-agent system in the real world,mainly due to its unique ability to make decisions while interacting with the environment and other agents.Despite this,the real environment presents complexity and unpredictability,leading to test environments that differ in some way from the training environment.As a result,algorithms must ensure that agents remain robust to these changes and can effectively generalize across different tasks and environments.With this in mind,this thesis focuses on exploring the generalization of multi-agent reinforcement learning algorithms.The primary research work of this thesis is as follows:1.To handle the generalization problem caused by the change in some agents’policies,Human-Preference based Multi-agent Policy Ensemble(HPMAPE)algorithm is proposed.The algorithm optimizes the way of policy ensemble by comprehensively considering the short-term benefits and long-term cumulative returns of each action,enabling the agent can flexibly apply different policies under various situations.By leveraging the advantages of different policies,HPMAPE effectively enhances the generalization capability of the agent while avoiding the problem of excessive exploration that may lead to reduced efficiency.In the Multiagent Particle Environment experiments,the HPMAPE algorithm exhibits the best final performance and training speed.Besides,when faced with the abnormal situation of some agents’ policies change,the HPMAPE algorithm achieves a success rate of approximately 50 percent over other baseline algorithms.The experimental results show that the HPMAPE algorithm has a good training effect and generalization.2.To solve the generalization problem caused by the change in the number of agents,Role-Parameter-Sharing based Multi-agent Policy Ensemble(RPSMAPE)is proposed.The algorithm uses the Kendall correlation coefficient to assess the correlation between agents,and dynamically assigns their roles.Agents with the same role share the policies’ parameters and it solves the performance problem caused by the change in the number of agents.Besides,it solves the computation problem when multiple policies are learned in the HPMAPE algorithm.In the test scenario where the number of agents changes,the success rate of the RPSMAPE algorithm significantly outperforms the comparison algorithm.For instance,when the number of agents changes from 5*5 to 10*10,the performance of the MADDPG-S algorithm’s performance decreases by 17.08%,while the performance of the RPSMAPE algorithm remains at 70.24%.The experimental results show that the RPSMAPE algorithm significantly enhances training efficiency and generalization.3.To deal with the generalization issue resulting from changes in tasks,Adaptive Context-based Multi-Agent Reinforcement Learning(ACMARL)is proposed.Firstly,in order to facilitate knowledge sharing and solve the problem of generalization between tasks caused by the differences between testing and training environments,it utilizes a hybrid encoder with the attention module to realize the representation of context information.Secondly,ACMARL divides the joint action space into action subspaces,where different contextual representations correspond to different action subspaces,which greatly reduces the space dimension explosion problem.Finally,in the Minecraft-like grid world test environment,ACMARL reduces time consumption by 24.06%and its convergence speed and final performance are significantly better than other comparison algorithms.When it is applied to a new task different from the training task,the success rate is 15.46%~17.85%higher than other comparison algorithms,thereby proving the effectiveness of the algorithm. |