| Recently,cellular-connected UAV(Unmanned Aerial Vehicle)has received extensive attention from all walks of life as aerial users.Due to its communication mode based on cellular network,cellular-connected UAVs have the advantages of ubiquitous accessibility,easy deployment and use,fast data transmission speed,robust navigation,and low Cost-Effectiveness,so they have a wide range of applications in logistics and distribution,search and rescue,power inspection,building inspection and monitoring,etc.The main advantage of cellular-connected UAV users is to optimize their path while completing the task quickly.In the communication system of cellularconnected UAVs,the trajectory of UAVs is easily affected by factors such as flight time,energy limitation,air-ground interference,and 3D coverage.The traditional path planning methods of cellular-connected UAV generally include graph theory,convex optimization,and traditional path planning algorithms.These methods can achieve good results in many scenarios,but there are some shortcomings such as poor adaptability to complex environments,high algorithm complexity,and difficult to guarantee security.In particular,it is necessary to establish an accurate end-to-end channel model between the UAV and the cell,and it is difficult to derive the closed-loop communication expression.In order to overcome the above difficulties,Deep Reinforcement Learning(DRL)has good advantages.DRL method does not need an end-to-end communication model and can adapt to complex environments.Only the UAV needs to interact with the system environment to learn,and the UAV can dynamically adjust its position and flight direction to meet the various constraints of the cellular connection,so as to complete the path planning.However,the early DRL method is not suitable for path planning of cellular-connected UAV multi-target point cruise task and multi-UAVs cooperative target assignment task due to its sparse reward and inability to realize multi-UAVs cooperation.Based on the above analysis,the research content of this paper is as follows:(1)In the multi-target task of cellular-connected UAV,Hierarchical Reinforcement Learning(HRL)and deep reinforcement learning are combined to analyze the path planning by combining cellular connection,collision constraint and task sequence.(2)In the task process of multiple cellular-connected UAVs,Multi-Agent Reinforcement Learning(MARL)is used to analyze the path planning of multi-UAVs by combining cellular connection,collision constraint,and dynamic target assignment.(1)In the multi-objective point cruise task of cellular-connected UAV,for the general single-objective point task of cellular-connected UAV,the general reinforcement learning method is usually adopted,which cannot effectively deal with the multi-objective point optimization problem of cellular-connected UAV.In this paper,a combination of HRL and DRL is proposed to solve the trajectory optimization problem of multi-target points of cellular-connected UAVs.In simple terms,the path planning is divided into two layers,one layer uses HRL to select the target point,and the other layer uses DRL to plan the path of the corresponding target point.Firstly,the communication system environment of cellular-connected UAVs is modeled,and an optimization problem is established to minimize the weighted sum of task completion time and interruption time of UAVs.Then,the optimization problem was discretized and converted into a Markov decision process.Finally,the reinforcement learning algorithm of H-D3 QN was used to optimize the trajectory of the UAV,so as to solve the multi-target point cruise problem of the cellular-connected UAV.The final experimental results show that compared with the single-layer reinforcement learning method and the traditional multi-objective point trajectory optimization method,our proposed method can deal with the multi-objective point path planning problem of the cellular-connected UAV more effectively,and has better path planning performance and better path planning efficiency.(2)For the collaborative target allocation and path planning task of multiple cellular-connected UAVs,the existing studies of multiple cellular-connected UAVs either only focus on the trajectory optimization of a single UAV,or assume that they perform tasks at different times,or assume that multiple cellular-connected UAVs do not interfere with each other and the target position is fixed and assigned in advance.This is not consistent with the real mission environment of cellular-connected UAVs.Based on this,this paper proposes a MARL method for UAVs target assignment and path planning.Traditional UAV target assignment and path planning often need to rely on the controller,while the MARL method can be in the case of no central controller,is that multiple UAVs work together to achieve the optimal rough.Specifically,we propose a MARL framework based on Deep Deterministic Policy Gradient(DDPG)algorithm for path planning and target assignment of cellular-connected UAVs.In this framework,each UAV performs corresponding path planning and target assignment decisions based on local sensing information to maximize the overall revenue.At the same time,we also introduce a shared experience pool for improving the stability and learning efficiency of the algorithm.Firstly,the environment of multiple cellularconnected UAVs to ground communication system was modeled,and the total task completion time and total interruption time of multiple cellular-connected UAVs were weighted as an optimization problem.Then,the optimization problem is discretized and transformed into a Markov game.Finally,the MADDPG series algorithm proposed by us is used to optimize the trajectory of the UAV,so as to solve the problem of dynamic target assignment and path planning of multi-UAVs.Our experimental results in a simulated environment show that the proposed MARL framework can effectively realize the path planning and target assignment tasks of multiple cellular-connected UAVs,and can adapt to complex and dynamic environments with certain decisionmaking performance and robustness. |