| With the application of the fifth generation mobile communication technology(5G),wireless communication network services are developing rapidly and the number of mobile devices is growing explosively.However,the available spectrum resources are limited,and the scarcity of spectrum resources is becoming increasingly serious,especially for the emergency communication networks in extremely harsh environments,how to efficiently restore the communication services and improve the quality of services is a crucial and arduous task for disaster assessment and relief assistance.Cognitive radio network(CRN)technology enables multiple secondary users to intelligently sense and select spectrum holes that are not occupied by primary users.At the same time,spectrum aggregation technology and dynamic spectrum access(DSA)technology are combined to meet the high bandwidth demand of users and further improve the spectrum utilization.The spectrum environment in the real communication network changes complicated,which brings more challenges for secondary users to access the spectrum.The spectrum access technology based on multi-agent reinforcement learning algorithm is a new approach to deal with dynamic spectrum environment.The main purpose of this thesis is to design multi-agent reinforcement learning algorithms to solve the multi-user and multi-channel spectrum access problem in different wireless communication networks,and verify the effectiveness,robustness and complexity of multi-agent reinforcement learning algorithms through the performance analysis of users and networks.Firstly,this thesis briefly introduces the scarcity of spectrum resources in emerging wireless communication networks and emergency communication networks to elicit the research purpose and significance of this thesis,and then introduces the development status of effective means such as CRN and machine learning(ML)algorithms.Furthermore,this thesis introduces the relevant concepts of emergency communication networks,and then elaborates the principles of spectrum sensing,spectrum aggregation and spectrum access in CRN.At the same time,it summarizes the principles of reinforcement learning(RL)and the advantages and disadvantages of various classical RL algorithms,laying a solid foundation for subsequent research.Secondly,this thesis studies the multi-user spectrum aggregation and access algorithm in the public wireless communication network model,where the secondary users have different bandwidth demands,spectrum sensing capabilities and spectrum aggregation capabilities.In this thesis,a bandwidth priority-based spectrum access scheme is designed to make secondary users sense and access spectrum successively,so as to avoid repeated selection of the same frequency band by multiple secondary users.Due to the limitations of conventional RL algorithms in dealing with the complex multi-user and multi-channel spectrum environment,a multi-agent actor-critic algorithm is proposed to achieve the optimal benefit of the whole network and all users by sharing action information.In this thesis,the deep Q network(DQN)algorithm is compared with the multi-agent actor-critic algorithm to verify that the proposed algorithm can handle relatively large number of channels and effectively improve the successful access probability of secondary users.Finally,this thesis proposes a dynamic spectrum aggregation and access algorithm for unmanned aerial vehicle-assisted rescue cognitive networks(UAV-RCNs),aiming at the multi-user spectrum access problem in the case of UAVs spectrum and energy constraints.Since the bandwidth priority scheme in the public communication network model designed in Chapter 3 leads to a significant reduction in the speed of user access,the problem of rapid growth in delay will be brought with the increase in the number of users.This thesis cancels the bandwidth priority and takes into account the spectrum change caused by the UAV movement,and constructs the emergency communication network model.This thesis proposes a multi-agent actor-critic algorithm based on maximum entropy method to manage multi-user access.The introduction of maximum entropy method encourages the exploration of user agents and avoids convergence to a non-optimal deterministic strategy.The proposed algorithm adopts a distributed execution and training framework to fully optimize the randomness strategy,and the ultimate goal is to realize the sharing of spectrum resources between primary users and secondary users in emergency communication networks.The maximum entropy based multi-agent actor-critic algorithm proposed in this thesis is compared with classic actor-critic algorithm and DQN algorithm,and discussed from the aspects of accumulated reward,average available transmission rate,average collision rate,power consumption and complexity.Simulation experiments show that it can significantly improve the users’ transmission rates,reduce collision between users and power consumption,which verifies the excellent performance of the proposed algorithm in emergency spectrum. |