| With the increase of communication users,traditional terrestrial communication networks have been unable to meet people’s needs for ultra-reliability,high quality of service and diversity of wireless communication.As a typical means of air communication,UAV communication can provide users with high-reliability,high-flexibility,easy-deployment and low-cost communication services,and is regarded as a supplement to future air network solutions and terrestrial communications.For the needs of communication coverage enhancement and emergency communication,single-UAV relay and UAV-swarm base station are two typical communication scenarios.Resource management is a key technology in the UAV communication system.Because traditional wireless resource management technology deduces policies based on complete and accurate environmental state information,it is not only difficult to implement,but also has high computational complexity and cannot change the policies according to the changes of the environment.For this reason,this dissertation focuses on the intelligent management technology of wireless resources for two typical communication scenarios in the UAV communication system.First,a reinforcement learning based hierarchical intelligent decision-making(RLB-HIDM)architecture is studied.Then,based on this architecture,communication resource management schemes for single-UAV relay and UAV-swarm base station are developed.A three-step intelligent solution(TSIS)is presented for the resource management of the single-UAV relay communication link.For UAV-swarm base station scenario,a resource management algorithm based on fast reinforcement learning is proposed to solve the resource management problem in UAV communication.A clustering-aided multi-agent reinforcement learning(CA-MARL)scheme is proposed to manage the resources of ground user communication in the UAV-swarm base station,which enables the UAV communication system to realize intelligent and autonomous dynamic resource management with low complexity.The main research contributions and innovations of this dissertation are as follows:(1)For wireless resource management in typical communication scenarios of single-UAV relay and UAV-swarm base station,the RLB-HIDM architecture is studied and the decision-making scheme of each layer is designed.Compared with traditional architectures,this architecture does not need to know every information about its environment in advance,only needs the feedback from the interaction with the environment.It explores optimal policies in terms of attempts and errors,and thus is suitable for dynamic environment.Besides,the computational complexity is much lower than traditional architectures.(2)For the problem of transmit power and path optimization in single-UAV relay communication link,a three-step intelligent solution(TSIS)is presented to convert the high-dimensional joint decision-making problem into low-dimensional sub-problems.In the first step,dimension reduction is carried out,and a model parameter reconstructing machine learning(MPR-ML)algorithm is proposed to optimize the flight altitude of UAV.In the second step,a two-dimensional flight trajectory design algorithm based on ant colony optimization is presented to solve the NP problem with low complexity.In the third step,a power control algorithm based on the prioritized sampling twin delayed deep deterministic policy gradient(PS-TD3)is proposed to make the policy converge to the optimal quickly.The simulation results show that the proposed scheme and algorithm have obvious advantages over traditional algorithms in performance,convergence speed and computational complexity.(3)For swarm deployment and power optimization of inter-UAV communication links in UAV-swarm base station scenarios,this dissertation proposes a joint decision algorithm based on deep Q-network(DQN)swarm mode and transmission power,and proposes three improved DQN algorithms to improve the convergence.In order to make the joint decision of swarm mode and power control adapt to the changing environment,two fast reinforcement learning algorithms based on meta deep Q-network(Meta-DQN)and model value expansion deep Q-network(MVE-DQN)are proposed respectively.Compared with DQN algorithm,the number of samples required for training can be greatly reduced,and Meta-DQN algorithm can achieve few-shot learning.The MVE-DQN algorithm can converge to the optimal solution with a higher probability.The simulation results verify the effectiveness of the algorithm.(4)For the joint optimization of node deployment,user association,power control and time-frequency resource block allocation for ground user communication by UAV-swarm base station,a CA-MARL scheme is proposed,which decouples the high-dimension joint optimization problem into three sub-optimization problems and solves them in two stages to solve the NP-hard problem in joint optimization with low complexity.Stage 1: In the pre-deployment stage,an unsupervised clustering algorithm based on Modified Expectation-Maximization(MEM)is first proposed,which transforms user association problems into UAV-cluster matching problems and reduces the decision dimension.Then,an UAV-cluster matching algorithm based on Kuhn-Munkres(KM)is studied to complete user association and UAV pre-deployment.Stage two: Position fine-tuning stage of UAV,this dissertation presents a multi-agent twin delayed deep deterministic policy gradient(MATD3)algorithm to determine the UAV trajectory and transmit power of the UAV.The algorithm achieves the optimal policy more easily via a low-bias Q value estimation.A time-frequency resource block allocation algorithm based on multi-agent advantage actor-critic(MAA2C)is proposed.The feature of advantage update makes the training more convergent and can effectively resist typical interference.The simulation results show that the CA-MARL scheme proposed in this dissertation can achieve a good performance with low complexity,and the performance of MATD3 and MAA2 C is better than that of traditional reinforcement learning algorithms. |