| At present,autonomous driving has been a hotspot in the research field of vehicleengineering.Various types of vehicle decision control systems have their own characteristics.Among them,the generalization ability of algorithms has always been the goal of research and development.Therefore,based on the in-depth analysis of different autonomous control methods,this paper proposes a model of autonomous driving decision control based on imitation-reinforcement learning.The main contents of this article include:(1)DDPG-based autonomous driving decision control.In order to solve the problem of poor generalization based on the model control method,this paper adopts the model-free DDPG algorithm as the basic algorithm framework,in order to obtain good generalization performance in the scene-changing autonomous driving task.(2)Autonomous driving decision control based on WGAIL-DDPG.When applying reinforcement learning to autonomous driving control tasks,the agent needs a lot of trial and error to explore the optimal strategy,and learning efficiency and the risk of trial and error are important issues to be considered.To this end,this paper proposes the WGAIL-DDPG algorithm,by introducing imitation learning in the early stage of reinforcement learning training,to effectively reduce the agent’s action search space and reduce the number of trial and error,thereby making the algorithm learning efficiency can be effectively improved.(3)Autonomous driving decision control in a multi-vehicle environment.At present,most of the autonomous driving control algorithms based on reinforcement learning are trained based on an ideal single vehicle environment,resulting in insufficient model adaptability.In order to solve this problem,based on the constructed WGAIL-DDPG decision control model,this paper makes a targeted design from the level of reward function construction.The experimental results show that,based on the designed reward function,the trained autonomous driving decision model can achieve safe and smooth autonomous driving of the target vehicle in a multi-vehicle environment.(4)Parameter optimization of DDPG training process.In order to solve the problem that the original DDPG algorithm is easy to fall into local optimization when applied to autonomous driving control tasks,this paper adds the discriminator’s supervision signal to the DDPG training process to prevent the autonomous vehicle control system from adopting unreasonable vehicle control strategies.The focus of this work is: designing a simulation-reinforcement learning control model to improve the training speed and stability of the model;and combining the multi-vehicle application environment with a targeted design of the reinforcement learning reward function. |