Font Size: a A A

Reinforcement Learning Based Resource Allocation Of Energy Harvesting MQAM Wireless Communication System

Posted on:2020-04-20Degree:MasterType:Thesis
Country:ChinaCandidate:M Y LiFull Text:PDF
GTID:2428330575481381Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
The energy harvesting(EH)technology in wireless communication refers to a technology in which a communication device can collect renewable energy such as solar energy,wind energy,electromagnetic energy,and the like from the surrounding environment.The communication node with energy harvesting equipment can be deployed more flexibly,reduce the dependence on the power supply of the power grid,and is more in line with the concept of energy saving and emission reduction,and is the development trend of green communication in the future.Combining energy harvesting technology with traditional communication's Multiple Quadrature Amplitude Modulation(MQAM)technology can simultaneously alleviate the shortage of fossil energy and spectrum resources.This paper focuses on the resource allocation of the energy harvesting MQAM wireless communication system and the main contents are as follows:1)In order to improve the capacity of communication system and improve the utilization efficiency of energy harvesting,the resource allocation problem of EH-MQAM point-to-point wireless communication system based on reinforcement learning method is studied.Due to the random burst of energy in the energy harvesting communication system,coupled with the variability and fading of the wireless channel,we are unable to predict the state of energy arrival and channel.The traditional convex optimization method is no longer suitable for solving the optimization problem of such communication systems,so this paper uses a novel method-reinforcement learning to solve the optimization problem of the system's maximum throughput.Firstly,the most basic table-based value function reinforcement learning algorithm-Q-learning and SARSA algorithm are tried to find the optimal transmission strategy for each time slot of the time-sharing communication system.Then the convergence of Q-learning and SARSA algorithms is proved by mathematical methods.Finally,simulation experiments show that both algorithms can achieve convergence.With the Q table after convergence,they can find the optimal transmission policy and outperform other traditional transmission policies in terms of throughput performance.2)In order to optimize the slow convergence of SARSA and Q-learning algorithms,additional memory resources are needed to store the action value function table.This paper attempts a value function approximation SARSA algorithm based on the Tile-Coding method to approximate the SARSA algorithm.According to the main characteristics of the optimization problem of communication system in this paper,three sets of Dirichletian functions are designed.Use the vector product of the basis function and the weight to approximate the action value function that needs to be stored in the table.Simulation experiments show that the approximate SARSA algorithm can also find the optimal transmission strategy,and the convergence speed is fast,does not occupy extra memory,and is more suitable for small and flexible wireless communication devices.3)In order to avoid the trouble of 2)manually searching for system features to construct the basis function to approximate the action value function,this paper continues to explore the approximation method of the value function,and uses the powerful fitting tool of neural network to automatically obtain the characteristics of communication system transmission strategy.The two techniques of memory playback and setting the target network in the Deep Q Network(DQN)algorithm are used to alleviate the characteristics that the neural network is not easy to converge.Simulation experiments show that the DQN algorithm can also find the optimal transmission policy of the communication system in this paper,and the convergence speed is very fast,no need to manually find the system features.
Keywords/Search Tags:Resource allocation, Energy harvesting, Reinforcement learning, Markov decision process, MQAM, Deep Q network
PDF Full Text Request
Related items