| In recent years,spectrum resources are used more and more frequently,which is the inevitable result of the development of communication technology and the promotion of communication terminals.From some research results,it can be found that it is the unreasonable spectrum allocation strategy used for different wireless access technologies that leads to the shortage of spectrum resources and the low efficiency of spectrum resources utilization.The dynamic spectrum access technology of cognitive radio technology and key technology is based on this situation.The current spectrum situation can be accurately perceived by cognitive users in the network and dynamically allocate spectrum resources,that is,cognitive users can find spectrum holes and access them in time for communication.Therefore,this technology not only ensures the normal communication of authorized users,but also makes full and flexible use of spectrum resources.How to make cognitive users wait for the opportunity to access the channel in an unknown environment and minimize the probability of conflict with authorized users and other cognitive users has become a key problem to improve spectrum efficiency.Based on reinforcement learning theory,this paper mainly studies dynamic spectrum access algorithms in different wireless network scenarios.The specific contents are as follows:Firstly,this paper focuses on the non cooperative spectrum access scenario of traditional multi cellular user cells,completes the modeling combined with practice,and gives the system model design parameters in detail.Based on this model,combined with reinforcement learning theory,this paper focuses on the spectrum access algorithm based on deep Q network,and the effectiveness of the optimized algorithm is verified by simulation.Then,considering that D2D communication is the key technology in the next generation mobile communication system,which allows adjacent mobile terminals to adopt the authorized spectrum and transmit the data end-to-end,so as to obtain the advantages of saving transmission resources,reducing base station load and improving frequency utilization.Therefore,it is of great significance to further study the D2D communication access optimization algorithm based on reinforcement learning.Then,in order to solve the problems found in the previous work,the algorithm is improved by separating the objective function from the Qfunction,which provides a method for the algorithm framework of generalized Q-learning,so that more objectives can be achieved.The simulation results show that after we optimize the logarithm of the objective function,the spectrum access of the two networks can achieve the theoretical optimal total throughput,reduce the fluctuation of the throughput of a single DSMA user,or solve the problem of spectrum resource fairness between D2D at the same time.Finally,the optimization of spectrum access technology based on Multi-Agent Reinforcement learning algorithm design is explored,which makes the research have higher practical guiding significance. |