Research On D2D Waveform Parameter Decision Method Based On Reinforcement Learning

Posted on:2023-01-18

Degree:Master

Type:Thesis

Country:China

Candidate:X Xie

Full Text:PDF

GTID:2558306905469054

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

With the rapid development of communication technology,the contradiction between the demands of massive users and the increasingly tight communication resources becomes more and more prominent.Device-to-device(D2D)communication is a direct connection technology,which effectively relieves the load of base station.However,in D2 D system,unreasonable waveform parameter decision usually leads to resource waste,system throughput decline and even communication interruption.Communication waveform parameter decision can improve the communication performance of the system by adjusting the waveform parameters of wireless signal reasonably to adapt to different channels.However,there are still some problems in waveform parameter decision,such as large amount of calculation,few decision parameters and poor generalization ability.Reinforcement learning directly interacts with the environment to learn without prior knowledge,so it is very suitable for solving decision-making problems.In this paper,the waveform parameter decision algorithm of D2 D communication system based on reinforcement learning is studied.The specific research contents are as follows:Firstly,a new waveform parameter decision model for D2 D system based on reinforcement learning is proposed.Compared with the traditional waveform parameter decision model,the new model has more parameters and stronger generalization ability.In order to solve the problem of slow convergence in the decision-making of D2 D users’ access frequency and transmission power,a distributed decision algorithm M-AC based on Actor-Critic(AC)is proposed in this paper.Each user is assigned two neural networks.The simulation results show that the M-AC algorithm can effectively improve the system throughput and the convergence speed is faster.In the AC network,the policy gradient update is based on the joint action reward,without considering the low system throughput caused by the contribution of individual user’s actions.In this paper,the AC algorithm is further improved by introducing reliability allocation and considering individual user action reward value.Simulation results show that the improved C-AC algorithm can further improve the system throughput.Secondly,aiming at the problem of channel estimation and poor generalization ability in traditional modulation mode and coding rate decision,this paper proposes a decision algorithm based on Q learning and Sarsa(?).Compared with the traditional Adaptive Modulation and Coding(AMC)technique,the algorithm proposed in this paper does not need to estimate the channel,but can directly make the parameter decision according to the actual system throughput,and can be adapted to different channel environments.Aiming at the problems of redundant fluctuation and slow convergence in the decision-making process caused by large exploration action space,the algorithm is improved by dynamically reducing the action space.Simulation results show that the improved algorithm has smaller mean square error,and the improved Sarsa(?)algorithm has a faster convergence speed than Q learning.At the same time,the system throughput is better than Modulation and Coding Scheme(MCS)index table.Aiming at the problem of large initial mean square error caused by random initial position of the system,the improved Sarsa(0.1)algorithm was further combined with MCS index table to optimize the algorithm.Experimental results show that the initial mean square error of the optimized algorithm is effectively reduced and the convergence speed is faster.

Keywords/Search Tags:

D2D communication, Reinforcement learning, Power Control, Resource allocation, Adaptive modulation and coding

PDF Full Text Request

Related items

1	Spectrum Planning In Cellular Network And Radio Resource Management In D2D System Based On Reinforcement Learning
2	Research On Adaptive Modulation Coding And Power Allocation Algorithm Based On Supervised Learning Method
3	Research On D2D Communication Resource Allocation Mechanism Based On Cellular Network
4	The Adaptive Modulation Coding Technology Research Based On Reinforcement Learning
5	Research On Adaptive Modulation And Coding Technology Based On Reinforcement Learning
6	Research On Algorithms Of Routing And Resource Allocation Based On Reinforcement Learning In D2D Networks
7	Research On Satellite Communications Resource Allocation Algorithm Based On Reinforcement Learning
8	Resource Allocation Of Deep Reinforcement Learning In Energy Harvesting Communication System
9	Research On Joint Allocation Algorithm Of Computation And Communication Resources Based On Reinforcement Learning In Mobile Edge Computing
10	Research On LEO Satellites Adaptive Coding And Modulation Technology Based On Machine Learning