| Power allocation is a major problem in spectrum allocation.At present,building a cognitive wireless network model is a feasible solution.With the dynamic change of the network environment,cognitive users in the network can adjust their power control strategy in order to meet the condition that their power interference to authorized users is small enough,and cognitive users can access the spectrum by chance,Realize spectrum resource sharing with authorized users,so as to improve spectrum utilization.Many scholars use reinforcement learning algorithm to solve the problem of model construction,but reinforcement learning algorithm itself is controlled by an agent,which has limitations such as curse of dimension.In the face of complex scenes such as the increase of cognitive users and dynamic changes of environment in wireless network environment,the traditional reinforcement learning algorithm is difficult to meet the complex and dynamic cognitive wireless network.Multi-Agent Reinforcement Learning adopts multiple agents to control multiple modules,which divides a complex problem into multiple small problems,and solves the limitation of single agent reinforcement learning in complex environment to a certain extent.Therefore,the focus is to combine and optimize the Multi-Agent Reinforcement Learning Algorithm with the power control in spectrum allocation,in order to make the reinforcement learning algorithm have better robustness and adaptability in more complex dynamic environment scenes.The main work of this thesis includes the following points:1.This paper introduces the knowledge and existing problems of resource allocation in the field of cognitive wireless.2.This paper summarizes the basic knowledge of cognitive radio power allocation,single agent reinforcement learning and Multi-Agent Reinforcement Learning.3.In order to solve the curse of dimensionality and dynamic non-stationary environment of reinforcement learning in cognitive radio power allocation,each secondary user in cognitive wireless network uses a multi-agent deep deterministic policy gradient(maddpg)agent for power control,and the action exploration mode of adding noise to maddpg algorithm is modified to random sampling mode,The network update mode is modified to delay update,which makes maddpg algorithm more efficient and stable in the power control scenario.4.Cognitive radio dynamic power control method for multi-user is proposed to solve the "utilitarian" behavior of agents.Based on the combination of maddpg and power allocation,according to its defects,it is suggested to add priority experience pool and improve reward function to maddpg to increase the convergence speed and improve the stability of the algorithm. |