Font Size: a A A

Resource Management Based On Deep Reinforcement Learning For Space-Air-Ground Networks

Posted on:2024-05-03Degree:DoctorType:Dissertation
Country:ChinaCandidate:D H DengFull Text:PDF
GTID:1528306944966489Subject:Electronic Science and Technology
Abstract/Summary:PDF Full Text Request
Limited by factors such as geographical environment and economic costs,the terrestrial infrastructure cannot be deployed at everywhere on the earth,and thus cannot provide wireless access service with fairness,high speed,and security for all users.On the basis of the traditional terrestrial networks,the space-air-ground networks further contain the air networks and space networks.Since the air networks and space networks provide the characteristics such as flexible networking,high environmental suitability,wide coverage,etc.,the space-air-ground networks aim to extend the coverage of the terrestrial networks,or achieve emergency coverage after the disaster.As a key technique to achieve the global coverage of the wireless communication,the space-air-ground networks have become a well-acknowledged development tendency of the future networks.With the explosive growth of user demand,the available resource of each network segment in the space-air-ground networks is extremely scarce and valuable.Besides,the inherent heterogeneity and high mobility of the air and space network segments also increase the complexity of the resource management.Recently,machine learning provides a novel solution for resource management,and thus attracts widespread attention.This thesis focuses on two basic scenarios in the space-air-ground networks including the unmanned aerial vehicle assisted communication and the airspace cooperative communication,and then conducts research on resource management based on deep reinforcement learning.The main work and innovations are summarized as follows:(1)A deep Q-network based two-hop cooperation scheme is proposed in the unmanned aerial vehicle assisted communication networks.This scheme aims to solve the problem that the system has limited feedback ability,as well as the problem that the current channel state information is lacking due to the system latency.Specifically,this scheme first establishes a two-hop cooperation structure including the macro base station decision and the local fine-tuning to reduce the feedback overhead.Then the deep Q-network algorithm is used to form the feedback adaptation strategy,which allocates resource on the basis of estimating the user distribution in the target area.The proposed scheme is further extended in time delay system,which utilizes the outdated channel state information to estimate the current optimal resource decision.Simulation results show that the proposed scheme is able to adjust the resource scheduling strategy based on the estimation of the user distribution.Meanwhile,the proposed scheme is superior to others in the time delay system when the channel is changing violently and the channel transfer probability is concentrated.(2)A deep recurrent Q-network based unmanned aerial vehicle trajectory design and user access scheme is proposed in the unmanned aerial vehicle assisted communication networks.This scheme aims to solve the issue that unknown user mobile trajectory brings negative impact on the sequential decision problem.To be specific,this scheme first jointly establishes the user trajectory prediction and resource optimization problem as a partially observable Markov Decision Process,and then remodels the state,observation,and action to reduce the complexity of the resource management.Finally,this scheme inserts a dimension divided long short term memory layer into the deep Q-network structure to achieve synchronous prediction and optimization.Simulation results show that the proposed scheme is able to adjust the locations of the unmanned aerial vehicles based on the prediction of the users’ trajectories.Moreover,the proposed scheme outperforms the baseline schemes in term of the sum rate.(3)In the space-air cooperative communication networks based on geostationary earth orbit satellite and unmanned aerial vehicle,an improved twin delayed deep deterministic policy gradient based power allocation and subchannel allocation scheme,as well as a twin delayed deep deterministic policy gradient recurrent network based power allocation and subchannel allocation scheme are proposed.These schemes aim to solve the problem that the dense user scenario leads to high complexity of the resource allocation scheme,and the problem that the imperfect information reception causes the lacking of the immediate state,respectively.Specifically,these schemes first establish a cooperation structure with satellite user grouping and satellite resource allocation to reduce the feedback overhead,and then integrate techniques such as the priority memory replay to improve the twin delayed deep deterministic policy gradient algorithm.Besides,the scheme uses the long short term memory layer to handle the information lacking,so as to estimate the current optimal resource decision.Simulation results show that the proposed schemes can handle the huge policy space brought by the dense user scenario,and effectively improve the rate.Meanwhile,the proposed scheme can alleviate the performance degradation caused by the imperfect information reception.
Keywords/Search Tags:space-air-ground networks, unmanned aerial vehicle assisted communication, space-air cooperative communication, resource management technique, deep reinforcement learning
PDF Full Text Request
Related items