| Device-to-device(D2D)communication is considered to be a key technology to increase data transmission rate,reduce latency,and reduce energy consumption.D2 D communication allows two nearby cellular users to form a communication link to communicate directly.The communication link does not need to pass through a base station(BS).D2 D communication based on cellular networks is a promising technology to improve spectrum efficiency.However,D2 D transmission will cause serious interference to the cellular and other D2 D links,and bring great technical challenges to the allocation of communication resources.The existing centralized solution requires global information,which causes additional signaling overhead for the network.However,the existing distributed solution requires frequent information exchange between D2 D users and cannot achieve global optimization.Therefore,a reasonable allocation of communication resources is of great significance for improving the capacity and energy efficiency of D2 D cellular networks.Aiming at the channel selection problem of D2 D users,this paper proposes a distributed channel matching algorithm.D2 D users complete the channel matching task independently without BS control,which improves the communication capabilities of D2 D users in large-scale cellular networks.Besides,D2 D users do not need to exchange information with other users.The relevant information includes the communication channel selected by the user and channel state information(CSI).In a random environment,each D2 D user is a learner,and its task is to learn the best channel matching strategy based on the observed reward and the channel state of the channel gain obtained through the pilot signal.Aiming at the problem of joint optimization of channel matching and communication power,this paper proposes a distributed communication resource matching algorithm,which can autonomously complete the joint selection of channel and communication power level without knowing the information of other users.While ensuring cellular user communication,the total throughput of D2 D users and the energy efficiency of D2 D users in complex cellular networks are improved,and the performance of the algorithm is verified by simulation.In addition,we propose a deep reinforcement learning framework in the D2 D communication network to separate learners and actors.The actor is responsible for collecting experience and implementing the learned strategy.The learner is responsible for learning strategies based on historical experience and passing the learned strategies to the actors.In this way,D2 D users only need to collect experience and implement strategies and hand over the learning algorithm to a more powerful data processing center for learning.This learning framework can effectively reduce the communication delay caused by D2 D user learning strategies,and at the same time improve the ability to learn large state spaces. |