| In the future,the large-scale increase of users will lead to an explosion of traffic,then how to allocate limited resources intelligently and efficiently in communication systems has become a hot topic.Meanwhile,non-orthogonal multiple access(NOMA)has recently been considered a powerful technology for the efficient use of resources,which improves spectral efficiency and guarantees massive access by introducing non-orthogonality.However,NOMA receivers employ successive interference cancellation(SIC)to decode signals from multiple users,which incurs time and operational costs.Meanwhile,the massive connection performance in the NOMA scenario is strictly affected by the resource allocation strategy.Therefore,this thesis proposes a resource allocation scheme for hybrid multiple access involving both orthogonal multiple access(OMA)and NOMA techniques.This thesis studies the deep reinforcement learning(DRL)and transfer learning(TL)based resource allocation problem in the hybrid multiple access system while considering the mobility of users,variability of environments and the sum-rate of the system.The main contributions of this thesis are summarized as follows.Firstly,a resource allocation scheme based on multi-agent DRL(MA-DRL)in the uplink multi-cell system with hybrid multiple access is proposed.The proposed scheme firstly uses the multi-agent DRL algorithm to allocate subcarrier resources.The DRL controller takes the sum-rate of all users as the target to intelligently select the access mode for users.Then it allocates power to users according to the allocation results and environment information,and finally optimizes and updates the network model according to the environmental feedback information.The simulation results show that our MA-DRL-based scheme achieves better system performance in sum-rate than the distributed learning scheme(DL)and the centralized learning scheme(CL).Meanwhile,an adaptive mechanism named learning determiner is introduced into the proposed allocation scheme,so that the DRL agent can interact with the communication system and determine the speed of network optimization according to the time-varying information of the communication environment,so that the network can be trained more efficiently.The learning determiner adaptively adjusts the training speed of the DRL agent by learning from the ideas of course learning.When the environment changes greatly,the training rounds are implemented several times,and when the environment changes little,the training times are reduced.In addition,the learning determiner selects memory blocks containing more environmental information from the DRL’s memory pool,instead of randomly selecting memory blocks in the previous work.The simulation results show that the addition of the learning determiner mechanism can make the neural network converge faster and obtain a higher sum-rate.Secondly,the TL is introduced into the MA-DRL for our allocation scheme based on transfer reinforcement learning(T-DRL),this scheme can significantly improve the convergence speed of the network in the face of the change of the environment structure,and it is unnecessary to restart the training when the distribution network is put into a new environment.In order to cope with different new environments,three transfer methods are proposed,which are transfer of subcarrier allocation network,transfer of power allocation network and transfer of subcarrier allocation network and power allocation network.The simulation results show that the convergence rate of these three methods is faster than that of the simple allocation scheme based on MA-DRL in the reconstructed communication environment. |