| Traffic oscillations,or "stop-and-go" phenomena,cause significant disruptions to transportation operations.Frequent acceleration and deceleration behaviors during this process lead to increased energy consumption,emissions,and safety risks.Existing studies indicate that by reasonably setting the longitudinal control strategy for autonomous vehicles,they can act as "stabilizers" within traffic flow,suppressing traffic oscillations.However,current autonomous vehicle motion modeling primarily extends the unilateral control strategy modeling approach from human-driven vehicle following,considering only the motion state of the leading vehicle and lacking a comprehensive consideration of the motion states of both preceding and following vehicles.This study explores the design of a bilateral control model that takes into account the correlation of vehicle motion before and after,and analyzes its impact on traffic oscillations.The main research content is as follows:Firstly,a bilateral control strategy is adopted to establish a Bilateral Control Model Based on Deep Reinforcement Learning(BCM-RL).This strategy requires the current vehicle to be positioned between the leading and following vehicles in terms of location and to maintain an average speed between the leading and following vehicles in terms of velocity.Based on this strategy,the action space of the model,the state space of the vehicle itself and the information of the preceding and following vehicles are constructed,and a multi-objective reward function is designed considering control efficiency,oscillation suppression ability,and safety.Given the limitation that the tail vehicle in the bilateral control model chain cannot use the bilateral control strategy,a unilateral control strategy is designed,and a Car Follow Model Based on Deep Reinforcement Learning(CFM-RL)is introduced as the tail vehicle in the BCM-RL vehicle chain to assist the BCM-RL vehicles in driving.Secondly,a two-stage nested training framework is proposed to achieve efficient model training.In this study,a nested two-stage training architecture is introduced.In the first stage,a qualified CFM-RL is trained within the CFM-RL training environment.Subsequently,in the second stage of BCM-RL training,the CFM-RL vehicle from the first stage is used as the tail vehicle in the BCM-RL vehicle chain to assist with BCM-RL training,employing parameter sharing for training the BCM-RL.The I-80 dataset,featuring traffic oscillations from the Next Generation Simulation(NGSIM)US Highway dataset,is used as the training set in the training framework to effectively train both models.Lastly,a simulation experiment scheme is designed to test the efficacy of the bilateral control model in suppressing traffic oscillations.To enhance the realism and complexity of the testing simulation,the speed curves from the I-80 dataset are concatenated to construct a test set for complex traffic scenarios.Subsequently,the safety,fuel consumption,emissions,and driving efficiency of the model are tested in both purely autonomous driving environments and mixed traffic environments.The experimental results indicate that in the purely autonomous driving environment test,compared to other models,the performance indicators of the BCM-RL vehicle chain are more outstanding while maintaining normal following efficiency.In the mixed traffic test scenario,the BCM-RL vehicle chain effectively alleviates traffic oscillations,and as the length of the vehicle chain increases,the traffic environment continuously improves. |