| The exponential growth of massive machine-type communication devices in cellular Io T networks brings great challenges to limited radio resources,resulting in congestion and unreasonable resource allocation during random access.In order to solve the problems of congestion and power allocation caused by insufficient access capacity of a large number of MTC devices and improve resource utilization,this thesis combines reinforcement learning algorithms to propose a learning-based access strategy and power allocation scheme for multi-cell network scenarios,and the main research is as follows:(1)The access congestion problem of large-scale machine type communication devices in multi-cell network is very important.In this thesis,a double deep Q network with value-difference based exploration algorithm is proposed.The algorithm focuses on the solution that can reduce the collision when a number of MTC devices access to g NBs in multi-cell network.The state transition process of the deep reinforcement learning algorithm is modeled as markov decision process.Furthermore,the algorithm uses a double deep Q network to fit the target state-action value function to help MTC devices choose the best g NB for themselves,and it uses an exploration strategy based on value-difference to adapt the change of the environment,which can take advantage of both current conditions and expected future needs.Simulation results show that the proposed scheme can effectively improve the access success rate of the system and reduce the network congestion.(2)To maximize the average transmission rate in a multi-cellular cellular network with interference,this article proposes a deep deterministic policy gradient algorithm based on incremental learning for power allocation.The algorithm achieves continuous power allocation through two-step learning.In the first step,a deep deterministic policy gradient network model is pre-trained through offline learning,which outputs a unique deterministic continuous action through the Actor network and continuously optimizes action decisions based on the interference state through the Critic value network.This process overcomes the quantization error and high-dimensional problems by the traditional deep reinforcement learning methods.Secondly,an incremental training method is proposed to learn the model online and fine-tune the network.The proposed method uses historical information and newly generated state information to update network parameters,reducing the coverage of the original model by new training.Simulation results show that the proposed algorithm outperforms the baseline and some reinforcement learning algorithms in average rate performance and generalization ability. |