| The contradiction between the scarcity of spectrum resources and the rapid growth of the number of mobile devices is a major problem to be solved urgently in the next generation cellular networks.Device-to-Device(D2D)communication technology realizes direct communication between adjacent users by reusing cellular network spectrum resources without access to base stations,so it can relieve the spectrum pressure of cellular networks,increase transmission rate,and reduce energy consumption.When D2 D users reuse cellular spectrum resources,it will inevitably cause interference to other users in the cellular network,thus limiting further performance improvement.Therefore,how to design an efficient resource optimization scheme to improve spectral efficiency and reduce interuser interference becomes the key to high-quality D2 D communication.As an important application of D2 D communication in the Internet of Vehicles,vehicle-to-vehicle(V2V)communication technology plays an important role in the security field of the Internet of Vehicles,but also faces the challenges of delay and reliability requirements.This paper focuses on the resource optimization algorithm for cellular D2 D communication.The main research work is as follows:Firstly,this paper proposes a D2 D communication resource optimization scheme based on Reinforcement Learning(RL).The scheme divides users into Cellular User Equipment(CUE)that communicates directly with the base station and D2 D User Equipment(DUE)that uses D2 D communication,and adopts Actor-Critic reinforcement learning based on policy gradients algorithm.In this algorithm,the state is defined as the collection of the signal-to-interference-plus-noise ratio(SINR)of the CUE and the channel selection parameter of the D2 D user.The action is described as the collection of the number of channels,the transmit power of each user.The reward function is the throughput gained by each user after taking the action.Actors follow a parameter-based stochastic policy to give successive actions,while Critic estimates the policy and evaluates the Actors’ actions.Simulation results show that this algorithm has obvious advantages in throughput compared with other comparison algorithms.Secondly,this paper proposes a resource optimization scheme for V2 V communication in Vehicular ad hoc networks(VANETs).In this scenario,V2 V User Equipment(VUE)reuses the uplink spectrum resources of the CUE.The proposed scheme divides VUEs with large differences in attribute values into different clusters by using a fuzzy clustering algorithm to reduce interference.The spectrum sharing problem is modelled as a weighted three-dimensional matching problem,and a set of resource allocation algorithms are proposed to solve the problem.Complex performance trade-offs.Simulation results show that this proposed scheme maximizes the total capacity of the CUE while ensuring the reliability of the VUE communication link. |