Unmanned aerial vehicles(UAVs)are a promising technology for future wireless mobile communication,offering ultra-reliable and low-latency communication,essential for the rapid deployment of sixth-generation networks.However,UAV communication systems must improve their onboard power,bandwidth,and battery capacity.These limitations make it challenging to maximize downlink throughput and obtain optimal resource allocation schemes.Traditional optimization methods have limitations in resource allocation and trajectory design.Firstly,offline optimization schemes assume that perfect channel information can be derived explicitly from radio propagation.Secondly,most optimization problems are highly non-convex.Finally,supervised and unsupervised learning requires sufficient prior data samples.Deep reinforcement learning has emerged as an effective solution to address these challenges,the immediate decision problem.This enables UAVs to learn to make decisions in dynamic communication environments continuously.In this thesis,we focus on a multi-UAV-assisted wireless communication system to maximize the throughput of ground users.We use deep reinforcement learning algorithms to conduct an in-depth study of the joint optimization problem of user scheduling,battery energy consumption,power allocation,bandwidth allocation,and3 D trajectory design.This thesis investigates a single UAV-assisted wireless communication system,focusing on the challenges faced by this system in terms of ground user link resource transmission scheduling,power allocation,and 3D trajectory design of UAVs.This thesis uses a deep Q-network(DQN)algorithm combined with deep learning to maximize the system throughput.However,the algorithm needs to be better equipped to handle the problem of stable data transmission under frequent changes in the external environment,and it is prone to overestimation.To address these issues,we employ the double deep Q-network(D-DQN)algorithm,often leading to the UAV flying out of the boundary on the 3D trajectory path.Finally,we use the dueling deep Q-network(DUL)algorithm,which combines pairwise functions.Therefore,we propose a stepwise optimization algorithm to achieve the maximum UAV throughput strategy in an uncontrollable dynamic environment.In order to expand the number of UAVs and serve a more significant number of users,this thesis investigates a multi-input single-output multi-UAV-assisted wireless communication system.The focus is on the challenges this system faces regarding ground user link resource transmission scheduling,height dynamics,bandwidth allocation,and trajectory design for multiple UAVs.We propose a solution based on the multi-agent deep deterministic policy gradient(MADDPG)algorithm and the centralized distributed computing Ray framework to study the UAV trajectory optimization scheme for the multi-UAV communication system to maximize the system throughput.Each UAV is designed to serve ground user satisfaction and transmission rate as rewards in a distributed Markovian decision process,which enables cooperation among multiple UAVs.Simulation results show that the proposed scheme in this thesis can effectively solve the problem of malicious competition among multiple UAVs that degrades the communication service quality.Furthermore,it can maximize the system throughput and serve more ground users by reasonably designing the pairing with ground users,bandwidth resource allocation,and flight trajectory scheme. |