Font Size: a A A

Dynamic Resource Allocation Of Aerial-based Relay Based On Deep Reinforcement Learning

Posted on:2023-03-12Degree:MasterType:Thesis
Country:ChinaCandidate:W J WangFull Text:PDF
GTID:2532306905969019Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development as well as the popularization,unmanned aerial vehicle(UAV)technology is beginning to emerge in many industries.UAV also becomes a promising approach to solve the problems of high cost and high delay in unmanned surface vessel(USV)communication.UAVs have the advantages of low cost,high mobility and flexible deployment.UAVs can act as aerial based relays to provide USV cluster with the service of communication access and information forwarding.They can adapt to changing environment and task requirements dynamically.Due to the limited resource,how to efficiently allocate the limited resources and realize the cooperation of multiple UAV relay has been widely concerned and studied by the academic and industry.Based on the deep reinforcement learning algorithm,this paper studies the collaborative dynamic resource allocation decision algorithm in the scenario of single UAV relay and multiple UAV relay respectively.Firstly,the resource allocation task scenario of aerial-based relay is modeled and an optimization problem is constructed.The problem is a non-convex optimization problem,which is difficult to solve by the traditional optimization method directly.Therefore,we employ reinforcement learning algorithm to solve it.Then the basic theory of deep reinforcement learning and typical algorithms are analyzed and compared.Secondly,the dynamic resource allocation algorithm in single UAV relay scenario is studied deeply.In single UAV relay scenario,the deployment location of UAV and the bandwidth assignment need to be decided.Deep deterministic strategy gradient algorithm has low sampling rate and low training efficiency.Gate Recurrent Unit is introduced into strategy network and value network respectively to correlating empirical data.Three recurrent reinforcement learning algorithms are proposed,namely GA-DDPG,GC-DDPG and GAC-DDPG.State observation,decision action and reward function are designed for experiment.The simulation results show that,compared with traditional DDPG algorithm,the convergence speed of the three algorithms is significantly improved by 37.9%,65.5% and65.5% respectively,which further indicates that value network plays a more critical role in reinforcement learning training.Then,the cooperative resource allocation algorithm in multiple UAV relays scenario is studied.In the scenario of multiple UAV relay,the user scheduling,UAV location distribution and bandwidth assignment need to be considered.The value decomposition network and multi-head attention mechanism are adopted to selectively synthesize the output results of each agent’s value network to obtain the global value function,so that the joint strategy can be evaluated from the whole point of view.In this way,the framework of "quasi-distributed" training is established.UAV relays do not need to transmit their own status information to avoid interception and eavesdropping and ensure data privacy and security.Experimental results show that the convergence speed of the proposed AC-Mix algorithm and MA2 DDPG algorithm increases by 30.0% and 63.3% compared with the benchmark algorithm,and the round average reward value increases by 15.8% and 16.9%,which proves that the proposed algorithms can make up for the information loss caused by local observation and strengthen the cooperation between agents.
Keywords/Search Tags:Collaborative resource allocation, Multi-agent reinforcement learning, Gate Recurrent Unit, Value decomposition network, Attention mechanism
PDF Full Text Request
Related items