| Compared with multiple UAVs,the application scenarios of a single UAV have stronger limitations and lower task execution efficiency.Multi-UAV collaboration has better stability and adaptability.Therefore,the research of multi-UAV collaboration technology appears to be very critical.The path planning of multiple UAVs is the cornerstone of UAV collaboration technology.Dijkstra algorithm and A* algorithm are classic algorithms of path planning technology.This type of algorithms can plan a reasonable path for a known environment,but for an unknown environment,the performance of the algorithms is not satisfactory.Some intelligent optimization algorithms,such as Ant Colony Algorithm,Particle Swarm Algorithm,Genetic Algorithm,etc.,are also used in UAV’s path planning.This type of algorithms usually searches for the optimal solution of the path in the task space.These algorithm models are complex and computationally intensive,unable to deal with random environments,and it is difficult for them to carry out real-time path planning.Aiming at the shortcomings of the above various path planning algorithms,combined with the specific task types of UAVs’ collaboration,this paper studies the path planning of multiple UAVs in unknown environments based on reinforcement learning.This article summarizes the multi-UAV mission scenarios into two types: the convergent mission and the assignment mission.These two mission types cover common multi-UAV collaboration scenarios.The main works and innovations of this paper are as follows:(1)Aiming at the path planning of multiple drones in the convergent task scenario,this paper is based on the deep reinforcement learning algorithm,DQN,and at the same time draws on the idea of Artificial Potential Field Algorithm to design the reinforcement reward,and proposes the APF-DQN model.This model overcomes the problems of slow convergence and poor effect caused by the sparse reward signal in the reinforcement learning process,and can better guide the UAV to the convergent point with a shorter path.In addition,based on Open AI Gym platform,this paper develops a training environment for the training of agents.In this paper,a comparative experiment between the APF-DQN model and the DQN algorithm is carried out.The experimental results show that the APF-DQN model can converge faster and the number of UAV’s steps is shorter.This paper also conducted a comparative experiment between APF-DQN and the traditional Artificial Potential Field Algorithm.The experimental results show that the APF-DQN model can better solve the problem of unreachable target points and local minimums,and the planned path length is shorter.(2)For path planning in the task allocation scenario of multi-UAVs,this paper divides path planning into two phases,task allocation and path planning.The task assignment of the UAV adopts the self-organizing mapping network model.At the same time,in view of the swing and non-convergence problems of the traditional self-organizing mapping network model in the targets’ assignment of UAVs,this paper proposes a two-stage iterative improved SOM algorithm.The experimental results show that the improved SOM model can better overcome the sway problem,have better convergence,and maintain the self-organizing characteristics of the SOM model.After each UAV is assigned its target,it uses the APF-DQN model for path planning in an unknown environment.In view of the unpredictable conditions such as the change of target points and the offline of UAVs caused by the dynamic changes of tasks,this paper uses a self-organizing mapping network to perform dynamic redistribution of tasks and simulates the above various conditions through experiments.The simulation experiments confirm the dynamic redistribution’s feasibility.Through this model,the success rate of multiple drones in completing tasks can be improved,and it reflects the stability and adaptability of cluster of UAVs to a certain extent. |