With the continuous development of connected and automated vehicles,high-level decision-making design of CAVs autonomous driving has attracted extensive attention.The traditional rule-based automatic driving decision-making method has great limitations due to the complexity of design and difficult to consider various situations that may occur in the traffic environment.With the rapid development of machine learning,reinforcement learning,an adaptive learning method,has attracted much attention.The intelligent decision-making of automatic driving using reinforcement learning has aroused great interest of researchers.RL technique has been successfully applied to the decision-making part of autonomous vehicle(AV)in many research.Applying reinforcement learning to autonomous driving decision-making is expected to solve the shortcomings of traditional rule-based decision-making methods.However,at this stage,RL’s research on autonomous driving decision-making mainly focuses on the driving decision-making of single CAV,and there is relatively little research on collaborative decision-making between vehicles.Autonomous driving is essentially a multi-agent system,there is interaction between multiple CAVs on the road.However,most of the existing autonomous driving approaches based on reinforcement learning only directly used general distributed RL in a multi CAVs environment to complete a specific operation.The interaction dependence between vehicles is rarely considered,and no specific collaborative approaches has been applied to the learning process.These limitations will affect the overall traffic efficiency.Therefore,the research on RL collaborative decisionmaking by CAVs has great research value.This paper studies the two most common scenarios on expressway: straight-line driving and ramp merging.Aiming at the cooperative driving decision-making problem of CAVs based on reinforcement learning decision-making in these two scenarios.Driving in a straight line on the highway,proposes a DRL coordinated driving method with driving decision rule constraints based on DQN architecture,and proposes an improved cooperative C-PPO algorithm based on near end strategy optimization(PPO)for the cooperative decision-making under CAVs ramp merging.Relevant research work is as follows:In the straight-line driving scene of expressway,a DRL coordinated driving method(RCDQN)with driving decision rule constraints based on DQN architecture is proposed.On the one hand,it combines the traditional AV rule-based control decision and DRL method.On the other hand,we combined with the idea of homogeneous experience sharing(HES)to improve team learning efficiency and promote the collaborative learning among vehicles by sharing of experience between vehicles.Compared with other vehicle models without collaborative learning or relying on expert rules,experimental evaluation shows that this method can achieve higher return and drive faster.Aiming at the collaborative decision-making problem of expressway CAVs on-ramp merging,an improved cooperative DRL algorithm-C-PPO based on PPO is proposed.Firstly,based on reinforcement learning,the markov decision process(MDP)model in CAV ramp merging scenario is constructed,and an effective reward function is designed for the four dimensions of safety,speed stability,time schedule and ramp merging cost.Secondly,the Actor-Critic framework is adopted to design a novel cooperation mechanism,which dynamically considers the policy update information of CAV near the ramp in multiple periods of the policy update process.In this process,the dominance value can be adjusted coordinately to realize the cooperation between ramp merging vehicles.The experimental results show that compared with the traditional PPO algorithm,the effect of C-PPO algorithm in on-ramp merging problem is significantly better than the mainstream RL algorithms based on PPO and ACKTR(Actor Critic using Kronecker-Factored Trust Region). |