| In the future vehicular networks,in order to ensure driving safety,self-driving vehicles will transmit a large amount of sensor data to the edge cloud platform for real-time processing.The high date rate and low latency of mmWave means that it will become an important technology of the future vehicular networks.Although performance optimization issues have been extensively studied in the scenario of single mmWave base station(mmBS)to maximize communication throughput,it is still very challenging to make performance optimization decisions in mmWave vehicular networks with multiple mmBSs.On the one hand,due to the exponential increase in complexity,it is not feasible to make beam selection at a central controller with global network information.On the other hand,distributed solutions may be interfered by overlapping beams between mmBSs,resulting in severe throughput degradation.At the same time,the impact of power on system performance cannot be ignored,and it is necessary to find a suitable power allocation strategy.Because the simultaneous selection of overlapping beams will cause a significant drop in throughput,this paper firstly proposes a joint beam selection problem for multiple mmBSs from the perspective of beam selection and solves it.The coordination problem between mmBSs is solved by modifying the Q-learning algorithm,and the simultaneous selection of overlapping beams is avoided as much as possible.By modifying the CUCB algorithm,the "exploration-exploitation" dilemma encountered by single mmBS when selecting beam set is solved.Then,the multi-agent reinforcement learningbased beam selection algorithm MARL-BS is proposed,which can greatly improve the system throughput while avoiding overlapping beams to be selected at the same time as much as possible.However,the multi-agent reinforcement learning-based beam selection strategy does not consider the impact of power on system throughput.Therefore,this paper further proposes a beam power allocation strategy based on KKT conditions.The whole multi-mmBS beam selection and power allocation joint optimization problem is gradually decomposed,and the optimal solution is obtained by solving the two sub-problems of beam joint selection and beam power allocation.The influence of beam selection and power allocation on system throughput is jointly considered.In the beam power allocation algorithm based on KKT conditions,KKT conditions are listed and the optimal beam power allocation decision is solved.By combining these two algorithms,a multi-mmBS joint beam selection and power allocation algorithm JBSPA is proposed to solve the problem of joint optimization of mmWave vehicular network performance from two perspectives of beam selection and power allocation in a progressive manner.Finally,our experimental results show that the proposed algorithm MARL-BS for beam joint selection based on multi-agent reinforcement learning has higher performance than other benchmark algorithms,but this algorithm is only a system performance optimization from the perspective of beam selection,the result still needs to be improved.The algorithm JBSPA for multi-mmBS can realize the joint optimization of beam selection and power allocation,which further improves the system throughput on the basis of the MARL-BS algorithm. |