| The Internet of Vehicles can comprehensively improve vehicle interconnection and intelligent transportation services,such as road safety,traffic efficiency and Internet access.However,due to the high-speed dynamics of the vehicle communication network,and the characteristics of the wireless channel will change due to the changes of complex traffic environment composed of pedestrians,buildings,obstacles and vehicles on the road.As a result,resource management such as wireless spectrum and power is greatly affected,and communication performance is also subject to many challenges.Therefore,how to effectively design vehicle to everything(V2X)communication resource management is very important.For the scenario of multiple vehicle-to-vehicle(V2V)links reuse vehicle to infrastructure(V2I)link spectrum resources in Single Input Single Output(SISO)cellular networks,this paper studies the resource management and optimization of V2 X communication to improve the communication performance of the system.However,due to the uncertainty of V2 X communication resource management and diversified service requirements,it is difficult to accurately model and optimize by traditional algorithms,this paper adopts a model-free solution based on deep Q learning for modeling and optimization.The solution first models the joint optimization management of wireless spectrum allocation and power control as a multiagent reinforcement learning(MARL)problem,and then uses a deep Q learning(DQL)method to solve the optimization problem.The main research work of this paper includes the following three aspects:1.Research and simulation realize the V2 X communication resource management scheme based on centralized decision-making and distributed execution of multi-agent deep Q learning.The solution firstly uses a deep neural network(DNN)for the local V2 V link to compress local observation information,and then feeds back the compressed information to the base station(BS)for further centralized decision-making.The purpose is to reduce the serious signaling overhead caused by each pair of V2 V links transmitting a large amount of local environment information to the base station for centralized processing;Secondly,the deep Q learning method is used at the BS to make centralized decision-making,and then the decision result is broadcast to each pair of V2 V links for execution;Finally,the weighted sum reward is used to dynamically balance the V2 I and V2 V performance.The simulation results show that the studied scheme has good training performance,and the total system capacity can reach 97.3%of the optimal performance.And this solution can adjust the total channel capacity of the V2 I link by adjusting the weight and the weight of the reward,while having a negligible impact on the V2 V link.2.Research and simulation realize the V2 X communication resource management scheme based on multi-agent deep Q learning of distributed decision-making and execution.The solution firstly models the problem of multiple V2 V link multiplexing V2 I link spectrum resources as a distributed multi-agent deep Q learning problem;Secondly,the local V2 V link uses the optimization method of Deep Q Network(DQN)suitable for distributed implementation to solve this problem,in which multiple pairs of V2 V links(agents)interact with the unknown communication environment to obtain local observation information.And perform actions to get a unified system-level reward.At the same time,learn to improve the spectrum and transmission power management strategy by using the gained experience to update the Q network.The simulation results show that through appropriate reward design and training mechanisms,multiple pairs of V2 V links can learn to cooperate in a distributed manner,while increasing the channel capacity of the V2 I link and the effective load delivery rate of the V2 V link,thereby improving communication performance of system.3.Propose and simulate to implement a V2 X communication resource management scheme based on Federated Learning(FL)for multi-agent deep Q learning.This scheme aims at the characteristics of long training time for distributed decision-making,slow convergence and the agent’s lack of understanding of global environment information,as well as the characteristics of short training time for centralized decision-making,fast convergence but high signaling overhead,and proposes a deep Q learning distributed collaboration method based on federated learning.Firstly,each pair of V2 V links adopts the deep Q learning method to perform n rounds of Markov Decision Process(MDP)training process locally,and all V2 V links feed back the Q network model parameters of this training to the FL server;Secondly,the FL server averagely aggregates all local model parameters into global model parameters,and then shares them with all V2 V links to help each V2 V link cooperate;Finally,all local V2 V links use the global model parameters to update their training Q network parameters,and perform the next n rounds of MDP training,and then loop this process until the model converges.The simulation results show that,compared with the scheme of research content 2,the training performance of this scheme is better,and it can achieve better V2 I link and V2 V link performance,and at the same time,it has good robustness to the change of effective load. |