Font Size: a A A

Research On Longitudinal Cooperative Control Algorithm Of Vehicle Platoon Based On Reinforcement Learning

Posted on:2020-12-24Degree:MasterType:Thesis
Country:ChinaCandidate:Z R TangFull Text:PDF
GTID:2392330620955965Subject:Vehicle Engineering
Abstract/Summary:PDF Full Text Request
This paper mainly studies the decision control algorithm of longitudinal cooperative driving for vehicle platoon.Different from the traditional rule-based control method,this paper chooses reinforcement learning to solve the acceleration decision problem in the vehicle platoon.Each following vehicle drives in a similar external environment in the platoon.In order to reduce the difficulty of the system's strategy learning,each following vehicle can be regarded as an independent agent,and share the same decision-making model.The decision-making problem of platoon cooperative driving can be transformed into the strategy learning of vehicle node.When the strategy of single vehicle tends to converge,the platoon will also drive steadily.The content of this paper mainly includes the following four aspects:(1)The basic theory of RL and the principles of related representative algorithms are introduced.After analyzing the advantages and disadvantages of each algorithm,an optimization scheme of Deep Deterministic Policy Gradient(DDPG)algorithm which combines the idea of imitation learning is proposed.(2)The implementation process of DDPG is explained by experience replay and target network skills.The MDP model of vehicle cooperative driving is established.Then,DDPG algorithm is used to train the vehicle to learn cooperative driving strategy based on constant space.When the strategy of vehicle node tends to converge,the cooperative driving simulation experiment of four-vehicle platoon is carried out.The simulation results show that the learned strategy can meet the driving stability of the paltoon.(3)The full velocity difference car-following model is used as a demonstration strategy for the following vehicle.A new supervisory cost function is designed to ensure that the demonstration strategy plays its supervisory role in the training process.Comparing the training process before and after the improvement of the DDPG,it can be found that the strategy converges faster after pre-trained.The reward function in MDP model is optimized based on the ride comfort.The simulation results show that the ride stability of vehicle platoon can be improved under the same driving conditions.(4)A 1:5 micro-intelligent vehicle platform is stablished,and the hardware composition is selected according to the requirements.The incremental PID controller is designed to realize the closed-loop speed control,and the pure pursuit algorithm is used to keep the vehicle driving in the fixed course.Based on the distributed process framework of ROS,the program operation architecture of the upper control system is designed.Then,the node program of each function module is written and tested accordingly.The platform is used to carry out collaborative driving experiments,which verifies that the upper controller based on DDPG algorithm achieves good following performance.
Keywords/Search Tags:Reinforcement learning, DDPG, Vehicle platoon, CACC
PDF Full Text Request
Related items