| With the fast development of vehicular automation and communication technology,connected and automated vehicle(CAV)will gradually occupy a place in the vehicle market.It is inevitable that CAV and human driven vehicle(HDV)will drive on the same road,which is called “mixed traffic”.Currently,due to the uncertainties of HDV,there still remains major challenges in modeling,controlling and stability optimization for mixed platoon.This study proposes an ad-hoc cooperative control strategy for mixed platoon of connected automated vehicles(CAVs)and human driven vehicles(HDVs)based on reinforcement learning.The strategy cooperatively control the longitudinal acceleration of CAVs segmentally in mixed platoon,which dampens the traffic oscillations effectively and greatly improves stability,travel efficiency and energy efficiency.The study uses HDV as the leading vehicle and different numbers of CAV as the following vehicles to decompose mixed platoon into five sub-platoons.For five types of sub-vehicles,study specifically developes five models based on deep reinforcement learning to cooperatively control the longitudinal acceleration of CAVs within each type of sub-platoon to gurantee the stability of sub-platoon and achieve local optimization of traffic flow.Therefore,the local optimization method simplifies the construction of the stability optimization of mixed platoon,considers the diverse characteristics of mixed platoon,and greatly reduces the computational complexity.Specifically,in order to develop a control model,this study describes the research problem and model construction scheme based on the Markov Decision Process(MDP)in a mathematical way.Based on the scheme,the study designs a multi-objective(string stability,car following efficiency,energy efficiency)reward function with adjustable weights and builds corresponding training environment for each type of sub-platoon.Therefore,this study develops two control strategies based on two different reward functions and each control strategy consists of five trained models.After that,the study verifies the performance and generailization capability of the model through NGSIM dataset.A series of mixed platoon simulation experiments in the low-speed scenario and high-speed scenario were then conducted.The results show that proposed control strategy can greatly dampen the traffic oscillations caused by HDV.Compared with HDV platoon,average travel efficiency of the CAV platoon increases by 3.87% in the low-speed scenario,and the energy efficiency increases by8.14% respectively.In the high-speed scenario,the travel efficiency and energy efficiency of the CAV platoon increase by 10.63% and 36.54%,respectively.In addition,this study analyzes the impact of the combined sequence of CAV and HDV on traffic flow at the same penetration rate and concludes that downstream clustered CAVs can optimize mixed traffic flow to the greatest extent.Finally,this study compares the proposed control strategy with the ACC and CACC strategy.A comprehensive comparison is made in terms of driving efficiency,energy efficiency,stability,safety,and emissions.The results show that compared with the CACC strategy,proposed strategy reduces by 3.2% in travel efficiency,and increases by 46.6% and30.6% in safety and string stability.The three emissions of CO,HC and NOx decrease by 12.2%,7.2% and 29.6% respectively. |