Font Size: a A A

Multi-agent Transfer Reinforcement Learning With Efficient Exploration

Posted on:2022-06-15Degree:DoctorType:Dissertation
Country:ChinaCandidate:W Z LiuFull Text:PDF
GTID:1528307061452504Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
The intelligence of multi-agent systems is widely studied in many applications like multirobot coordination,intelligent transport,and war gaming,etc.,which is an important part of distributed artificial intelligence.In the past decade,the deep neural network based methods for solving complex models have been well developed,as the storage technology and computing power are both being improved.Based on that,the trail-and-error-based deep reinforcement learning(DRL)method combines the powerful ability of non-linear function approximation of deep neural networks and the idea of dynamic programming,and has garnered particular attention in the research of artificial intelligence.How to combine the DRL technology and multi-agent systems has become a hot topic of distributed artificial intelligence in recent years.However,because of the problems of environment non-stationarity,credit assignment,sparse reward,and non-monotonic tasks,etc.,it is challenging for agents to explore in the environment cooperatively,which makes agents fail to improve the performance of the system.Hence,there still be some challenges when deal with the problems of multi-agent systems with DRL methods.How to solve those problems will be the key breakthrough to stronger artificial intelligence.Hence,the study of the DRL for multi-agent systems is of significance.The exploring ability of the multiple agents will directly influence the quality of the data from environment,as well as the performance of the neural networks.Hence,efficient exploration is the key of policy optimization and performance improvement.Based on the above background,the thesis aims at two main objectives:combining the DRL and multi-agent systems properly,improving the efficiency of cooperative exploration for multiple agents,and implementing the fast methods of computing for the optimal policies of complex tasks;learning from easier tasks to challenging tasks by transfer learning,and implementing the efficient exploration of multi-agent systems via knowledge transfer within agents or tasks.According to the above objectives,there are three scientific questions need to be solved:(1)how to decompose the complex task as easier sub-tasks to reduce the difficulty of the problem;(2)how to implement the DRL in multi-agent systems with dynamic number of agents,and built the knowledge transfer framework between new and old agents to guide the exploration of new agents;(3)how to build the framework of transfer learning across different tasks to implement the knowledge transfer,multi-agent efficient exploration,and the fast method of computing the complex tasks.Therefore,to solve the above three questions,the main contents of the thesis are as follows.Firstly,we build the basic framework of DRL and multi-agent reinforcement learning(MARL)based on the Markov decision process and decentralized partially observable Markov decision process,respectively.By comparing the differences of the modelling for singe-agent DRL and MARL,the reason of the environment non-stationarity of MARL is introduced.For the credit assignment issue of MARL,the paradigms of cooperative tasks and fully cooperative tasks are defined,and the corresponding value-based MARL algorithms are introduced.The drawbacks of these algorithms are explained from the aspects of the relationship between the conditions of additivity,monotonicity,and individual-global-max.For knowledge transfer in DRL,we introduce the basic concepts of transfer learning based on the assumption of independent and identically distributed sampling for traditional machine learning methods.As to the basic theory of transfer reinforcement learning,we provide the problem description,the evaluating methods,some popular transfer reinforcement learning approaches,as well as the multi-agent transfer reinforcement learning(MATRL).The contents in this part provide the basic model frameworks and theories for the following contents.Secondly,for some multi-agent systems with cooperative tasks,we study the MARL algorithms with task decomposition to solve the problems of large exploring space,long training time,local optimality,etc.By taking full use of the knowledge of descriptions for different subtasks in the rewards,we decompose the original rewards as several sub-rewards to implement task decomposition.By learning with these sub-rewards,we can get multiple different value functions for sub-tasks to guide the optimization of the j oint policy and improve the exploration.Specifically,we propose multi-agent Q-learning with task decomposition and multi-agent deterministic policy gradient with task decomposition for discrete action space and continuous action space,respectively.We analysis the superiority of the proposed algorithms by comparing the Q-values of different policies.The performance of the algorithms has been verified empirically through different simulations.Thirdly,we propose the MATRL method with incremental number of agents to transfer the knowledge of models from source domain to target domain by using the framework of teacherstudent transfer,which implements the fast learning in target domain.Without the assumption of fixed number of agents in traditional MARL methods,we propose to restructure the input space of the critic neural networks in the scenario with incremental number of agents,which makes agents be able to adapt the learning in target domain.For the knowledge acquisition of teacher agents,we build the experience replay buffer with knowledge transfer by storing the advised outputs of teacher agents as the supervising information of student agents for further training.As for the learning of the student agents’ policies,we propose the method of knowledge transfer by using policy distillation,in which the student agents can learn not only from the environment but also from the teacher agents.That mechanism can improve the exploring efficiency of the multiple agents in the target domain.Finally,based on the work about task decomposition and knowledge transfer in above,we propose the MATRL algorithm with successor features across different tasks by designing a vectorial task space.To unify the descriptions of different tasks in source domain and target domain,we decompose the reward functions as the linear combinations of a feature function and different weight vectors by following the idea of successor representation.We then build the model of successor features for multi-agent systems,and propose the algorithm of multi-agent deep deterministic policy gradient with successor features.In order to solve the complex tasks with sparse rewards or non-monotonic rewards,we calculate a jumpstart policy for the target task by taking full use of the pre-trained models in source domain,which implements more efficient exploration for the multiple agents in target domain.We have verified the proposed algorithms on two simulations of multi-agent cooperative box-pushing with sparse rewards and the multi-agent cooperative predator-prey with non-monotonic rewards.The simulation results indicate that the proposed MARL method with successor features and knowledge transfer can guide the agents to explore around some dangerous states in the target domain.That can help the agents collect more useful data about the target tasks,and improve the total performance of the multi-agent teams.
Keywords/Search Tags:multi-agent system, deep reinforcement learning, multi-agent reinforcement learning, efficient exploration, multi-agent transfer reinforcement learning
PDF Full Text Request
Related items