Unmanned Aerial Vehicles(UAVs)have become a key role of the current wireless network because of low cost and flexible deployment.How to improve the performance of the UAV-assisted communication system by trajectory design is a significant problem for UAV base stations.Nowadays,lots of work has been done to study the trajectory design of UAV in different communication scenarios,but the existing work is mainly faced with the following challenges:1)most of the work adopts the traditional optimization method,which models and solves the problem on the basis of mastering the communication system parameters,but in practical,the user side information such as user locations and channel parameters may be difficult to obtain or cannot be accurately measured;2)part of the work adopts reinforcement learning method,but most of the solved trajectories are rough trajectories based on discrete action space;3)due to the collision,unstable training environment caused by multiple UAVs,and the need to consider the cooperation among the UAVs,it is difficult to directly extend the existing research to multi-UAV assisted communication scenarios.In view of the above challenges,this paper proposes a continuous trajectory design strategy for multiple UAVs based on deep reinforcement learning.Considering different scenarios,the multi-UAV trajectory design problem without the knowledge of user side information is studied.The main contents are as followings:1)Firstly,considering the simple uplink transmission scenario,the trajectory design problem of continuous action space to maximize the transmission task completion rate is studied.The Multi-agent Twin Delayed Deterministic Policy Gradient(MA-TD3)algorithm is proposed.The idea of centralized training and distributed execution is adopted to design the trajectory of multiple UAVs.The simulation results show that the proposed algorithm can effectively design the trajectory without knowing the user side information.Compared with the existing continuous trajectory design method,the proposed algorithm has better performance in multi-UAV scenarios;2)Secondly,due to the high requirement of timeliness in actual communication scenarios,the trajectory design problem of minimizing the information age is further considered.The multi-UAV cooperative trajectory design strategy(CO-MUTD)is studied.A distributed transmission protocol is proposed,and the trajectory and the UAV-user connection strategy are jointly optimized.The joint optimization problem of hybrid strategy is transformed into continuous action space problem,and can be solved by reinforcement learning method.The feasibility and effectiveness of the proposed strategy are verified by simulation. |