Font Size: a A A

Reinforcement Learning Based Multi-Agent Path Finding

Posted on:2024-09-18Degree:MasterType:Thesis
Country:ChinaCandidate:C ZhaoFull Text:PDF
GTID:2568306932462274Subject:Cyberspace security
Abstract/Summary:PDF Full Text Request
With the development of artificial intelligence,multi-agent systems have been widely used in the military,logistics,rescue and other fields.Multi-agent path finding(MAPF)is the basis of multi-agent systems,so has important research value.MAPF aims to find paths for multiple agents from the start positions to the goal positions without any conflict.Classical MAPF methods can only applied in known and fixed environments,and the efficiency of path planning is not very high.Due to its good autonomous learning capability,reinforcement learning is widely used in various automatic systems with good environmental adaptability.Inspired by this,this thesis investigates reinforcement learning-based MAPF methods,aiming to improve the learning efficiency and environmental adaptability.The main work of this thesis is as follows:(1)To address the problem of sparse rewards,a MAPF method based on curriculum learning is proposed,which decomposes the MAPF task into sub-tasks from easy to difficult,alleviating the impact of the sparse rewards on the performance of the method and increasing the learning efficiency of the method.The method arranges three curriculums for the task,using intensive individual rewards in the first two curriculums to make the exploration more directed,and team rewards in the final curriculum to generate cooperative strategies.Experiments are conducted on random obstacle grid worlds and the results show that the method proposed outperforms state-of-the-art learning-based methods,especially in complex environments with high obstacle density.(2)To address the problem of exploding policy space,a MAPF method based on sequence model is proposed,which intuitively reduces the policy space from exponential to linear level and improves the scalability of the method by transforming the MAPF problem into a sequence decision problem.The method establishes a sequential decision paradigm,where agents select actions according to observations and actions of precursor agents in a certain order.Experiments are conducted on random obstacle grid worlds and the results show that the method proposed has a considerable lead over existing learning-based methods in environments with a large number of agents.
Keywords/Search Tags:Multi-Agent Path Finding, Multi-Agent Reinforcement Learning, Cur-riculum Learning, Sequence Model
PDF Full Text Request
Related items