| With the rapid development of industry-academic research activities on the sixth generation(6G)mobile communication system,it has been considered that 6G system would be able to satisfy the extreme performance requirements,such as ultra-high rate and ultrahigh density,over the fifth generation(5G)mobile communication system,and to support the novel applications such as digital twin and smart industry,and thus it can achieve the integration of the digital world and physical world.Under this vision,the non-terrestrial network(NTNs)consisting of different satellites and unmanned aerial vehicles(UAVs)is not constrained by the landforms,and has been regarded as the key-enabling technology to provide global and full-time coverage in the future 6G system.However,the fact that the NTN involving multiple nodes with different characteristics on mobility,available resource and service has various and complex deployment scenarios.It is necessary to perform adaptive resource management configurations to sustain a stable long-term overall performance for the NTN.Although there has been a consensus of deploying intelligent functions in various 6G related research reports,implementing global intelligence even local intelligence is hindered by different conditions on mobility,available resource and computing capability of different flying nodes,which therefore has become an urgent issue in sustaining intelligent resource management mechanism of NTNs.With the objective of optimizing the long-term performance of the NTN,this dissertation adopts deep reinforcement learning(DRL)to solve the radio resource optimizations of the typical NTN deployment scenarios such as UAV networks,low-earth orbit(LEO)satellite and geostationary-earth orbit(GEO)satellite.To this end,this dissertation pays attention to the design and deployment of DRL-based decision block,collaboration between different nodes and orchestration of decision-making tasks.Therefore,this dissertation includes four main research contents,i.e.,(1)DRL-based distributed multi-user access control in aerial networks?(2)Collaborative DRL for multi-user sub-channel allocation in LEO satellite communications?(3)Multi-time-scale DRL for LEO earth-fixed-cell of space networks?(4)Multi-tier DRL for resource optimization in space-air-integrated networks.In the following,the specific research contents and main contributions of this dissertation are summarized.At first,considering the fact that the flying base stations move with pre-configured orbits,a distributed DRL-based decision-making architecture is proposed in this dissertation,in which multiple users collaborate with the trainer node in the ground backhaul networks to perform decision-making tasks.Toward developing multi-user access control mechanism in UAV networks,this dissertation defines a handover cost function,and then formulates an optimization to balance the long-term throughput and number of handovers.Aiming at optimizing the long-term performances,the DRL-based decision-making block is adopted at the user side to capture the variation patterns hidden in the environment,and to further make access decisions.Additionally,the trainer node is adopted to update the DRL model parameters based on the interaction experiences collected from users so that the variation patterns of the environment can be summarized by the DRL model parameters.Based on the above designs,a user-driven distributed DRL-based algorithm is proposed for multi-user access control in this dissertation.The simulation results show that the proposed algorithm can achieve a throughput performance approaching to the optimal result achieved by the searching-based algorithm,and it can achieve different trade-offs between the throughput and number of handovers through adjusting the handover cost.Next,since capturing the variation patterns of the environment is difficult due to the rapid movements of the LEO satellites,a collaborative DRL architecture composed of an inter-satellite collaboration block and a satellite-ground collaboration block is developed.With a multi-beam LEO satellite downlink transmission scenario,a rate-satisfactory utility function is defined based on the transmission rate and sub-channel satisfactory,and a mathematic optimization problem to maximize the minimum utility value among users with respect to the sub-channel allocation decisions is formulated.Considering the periodic movement pattern and limited computing capability of the LEO satellite,the collaboration between LEO satellites and the collaboration between LEO satellite and users are proposed,and thus the powerful computing capabilities of ground users can be fully utilized to perform the DRL model training and inference tasks for the LEO satellite.Based on the above collaboration,a collaborative DRL algorithm is proposed for multi-user subchannel allocation.The simulation results show that the proposed schemes can achieve superior transmission rate performance and satisfy the user satisfactory over other DRLbased benchmarks in different simulation scenarios.Furthermore,since the timely beam adjustment manner of the earth-fixed cell leads to urgent demands on high computing capability,a multi-time-scale DRL-based architecture is proposed for the LEO satellite and ground user.Considering the joint optimization of beam direction and sub-channel allocation of the earth-fixed cell scenario,a multitime-scale resource configuration mechanism is investigated in this dissertation,in which the LEO satellite and UE make corresponding decisions with different time-scale control cycles.Additionally,this dissertation formulates a sub-channel minimization problem with the constraint of the receiving rate demand of the user.Aiming at monotonically improving the policies of the LEO satellite and user,a multi-time-scale multi-agent DRL algorithm is proposed based on the sequential-updating manner in this dissertation.Furthermore,this dissertation provides a analytical foundation for the proposed multitime-scale multi-agent DRL algorithm,and also provides the convergence analysis and corresponding convergence error bound.The simulation results show that,the proposed algorithm outperforms over the other DRL-based benchmarks in terms of the throughput performance.Moreover,compared to the beam-searching based and periodical-beamangle-compensation algorithms,the proposed algorithm can efficiently balance the user rate satisfactory,number of utilized sub-channels and computing complexityFinally,considering the overall performance optimization of the multi-tier NTN composed of GEO satellite and UAVs leads to a high-dimensional decision space,a DRL-based collaborative decision-making architecture is developed for nodes at different tiers.With a UAV-relay assisted multi-beam GEO satellite downlink transmission scenario,this dissertation defines a sum rate maximization problem with respect to the spectrum allocation of the GEO satellite,trajectory design of UAV-relays and access decisions of users.Furthermore,the high-dimensional optimization is decomposed as sub-problems for different nodes according to the relationships between different optimization variables,and then corresponding DRL decision-making blocks are developed.Additionally,the leaderfollower architecture and federated learning is integrated into the classical DRL algorithm to develop a generalized multi-tier DRL decision-making architecture,by which the joint,individual,hierarchical and federated multi-tier DRL-based algorithms can be degenerated according to the practical computing capability and signaling constraints.The simulation results show that,the proposed algorithm can enable different nodes to collaboratively optimizing the overall performance with different decision space dimensions,and can tackle an efficient trade-off between the decision space dimension and the overall performance. |