| Nowadays,with the rapid development of the Internet,people benefit more and more from the convenience of life and various entertainment brought by the Internet,however it gives rise to huge growth in communication data traffic.The system capacity of traditional orthogonal multiple access(OMA)technologies are approaching saturation due to the limited resources.Non-Orthogonal Multiple Access(NOMA)technology enables multiple users to communicate simultaneously on a single subchannel by utilizing superposition coding and successive interference cancellation techniques.This significantly expands the spectrum efficiency and improves the system’s user capacity.However,even with NOMA technology,resources such as subcarriers and power are still limited in practice.To better utilize these finite resources,the research on resource allocation in NOMA systems has become a critical challenge.In this paper,Deep Reinforcement Learning(DRL)is applied to the resource allocation task of a multi-carrier NOMA system to maximize the energy efficiency(EE)of the system by finely allocating sub-channels,power,and other resources.In Chapter 3,the DRL mechanism for allocating subchannels is designed using Genetic Algorithm(GA)for the power allocation which behaves as the part of the reward mechanism.In Chapter 4,the DRL-based joint subchannel and power allocation scheme is designed by replacing GA with DRL in the power allocation process to further reduce the computational complexity.The details are described as follows.First,this paper provides the essential details and basic concepts of NOMA techniques and reinforcement learning.Then,a multi-carrier NOMA system model is developed and the objective function to be optimized is formulated.The matching problem between sub-channel and user is transformed into a decision problem with K-steps,which is suitable for DRL processing.Furthermore,the corresponding elements such as state,action,reward are designed.The REINFORCE algorithm is employed as the algorithm for subchannel assignment,and an invalid action mask is used to constrain the infeasible actions in the subchannel assignment process.In order that the REINFORCE network with the mask can obtain the optimal or suboptimal energy efficiency of the system,the genetic algorithm-based power allocation is employed to generate the reward.Specifically,in the process of REINFORCE network training,based on the subchannel assignment results given by the network,the optimal or suboptimal power assignment scheme is solved by the GA,then the maximum total system EE is calculated for the current sub-channel matching case in turn.Further,the total system energy efficiency is fed back to the REINFORCE network with mask iteratively for training.The simulation results demonstrate that the proposed sub-channel assignment algorithm can effectively improve the total energy efficiency of the system.With known channel conditions and subcarrier allocation results,the computational effort to solve power allocation for each step requires iterations using GA,which will be very huge in the whole process.For this reason,this paper designs a deep reinforcement learning method to deal with the power allocation problem and employs the REINFORCE with baseline to deal with the joint power allocation problem.The REINFORCE with baseline learns strategies in interaction with the environment,then obtains the optimal or suboptimal power allocation scheme according to the learned strategies.Finally,the REINFORCE with baseline is combined with the REINFORCE-based subcarrier allocation algorithm to jointly allocate subchannels and power.It is verified by simulations with Python that the DRL-based method designed in this paper has significantly lower complexity compared with genetic algorithm,meanwhile it can obtain higher system EE compared with fixed allocation and FTPA(Fractional Transmit power allocation)algorithm. |