Font Size: a A A

Research On Multi-UAV Path Planning Based On Reinforcement Learning

Posted on:2024-09-12Degree:MasterType:Thesis
Country:ChinaCandidate:Y X TianFull Text:PDF
GTID:2542307079455414Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
As unmanned aerial vehicles(UAVs)continue to be increasingly used in civilian and military applications,the demand and research for UAV technology are becoming increasingly urgent.One type of UAV,the quadcopter,has extensive applications in search and rescue,aerial photography,environmental monitoring,and industrial inspection due to its strong flight capabilities.Path planning is an essential foundation for the safe and efficient execution of UAV-related tasks.This thesis divides multi-UAV path planning into two tasks: aggregation and allocation.Reinforcement Learning(RL)algorithms are well-suited for solving path planning problems for multiple UAVs due to their strong adaptability and generalization capabilities.However,the current DRL algorithm applied to multi-UAV path planning has some inherent limitations,such as high computational complexity,long training time,sparse reward and poor coordination.Based on the current challenges in multi-UAV path planning problems,the major research content of this thesis includes:First,to address the problem of sacrificing realism for high sample throughput in current UAV simulators based on deep reinforcement learning(DRL)algorithm models,this thesis proposes a custom Gym environment for multi-quadcopter UAVs based on Py Bullet.This thesis uses the Py Bullet module to construct the simulation environment based on the aerodynamic effects of quadcopter motion equations obtained from experiments to achieve this.The UAV’s observation and action space are abstracted to improve the fidelity and usability of the simulation environment.Finally,this thesis implements a 3D simulation environment for multiple quadcopter UAVs.It validates the realism of the proposed simulation environment to determine the transferability of subsequent algorithms in the real world.Second,to address the problems of long training time,poor generalization,and sparse rewards in existing DRL works,this thesis proposes an improved twin delayed deep deterministic policy gradient(ITD3)algorithm based on N step return and prioritized experience replay to solve path planning problems in aggregation tasks and designs corresponding reward functions.N step return is beneficial for estimating the target value function in RL by reducing update variance and accelerating the convergence rate.Prioritized experience replay accelerates the convergence and success rates by making sampling uneven and distributing experience samples in the experience pool according to their priority(evaluated by temporal difference error).At the same time,based on the environment constructed in this thesis,a suitable action space,state space,reward function,and training network are designed for path planning in aggregation tasks.The simulation results prove the effectiveness and generalization of the model and verify the point of the reward function setting.Third,to address the situational perception error caused by the coupling of task allocation and path planning in allocation tasks,this thesis proposes a new hierarchical distributed model based on the Deep Q network(DQN)and ITD3 algorithm.In the dynamic path planning process based on the DQN-ITD3 model,each UAV has an independent ITD3 model for path planning.DQN serves as a dynamic task scheduler and dynamically outputs the current task of the UAV to the lower ITD3 layer through four heuristic rules.At the same time,based on the environment constructed in this thesis,a suitable action space,state space,reward function,and training network are designed for the upper DQN network.Through comparative experiments,simulation results show the effectiveness and generalization of the DQN-ITD3 model in allocation tasks and verify the value of hyperparameters in DRL in this thesis.
Keywords/Search Tags:Multi-UAV, Path Planning, Deep Reinforcement Learning, Dynamic Task Assignment
PDF Full Text Request
Related items