| The automation and intelligence of vehicles are the focus of future research and development of the entire automobile industry.This can effectively alleviate a series of social problems such as traffic accidents,exhaust pollution,and traffic congestion caused by the sharp increase in car ownership.Deep reinforcement learning has great advantages in dealing with sequential decision-making problems in high-dimensional state space and continuous action space,and has been widely studied and applied in the field of autonomous driving.Based on the deep reinforcement learning algorithm,this article conducts research on the direction of autonomous driving.The specific tasks completed are as follows:First,an algorithm called Deep Deterministic Policy Gradient(DDPG)performs uniform random sampling of the historical data on the experience replay buffer when learning data,resulting in poor learning data.Aiming at this problem,a DDPG algorithm based on double-priority experience replay is proposed,which divides the experience replay buffer into two data areas with different priorities in proportion.This method realizes the optimization of the historical experience distribution used in the training process,improves the utilization of samples,and accelerates the convergence of the algorithm.Simulation experiments show that,compared with the traditional DDPG algorithm,the algorithm can obtain a more stable policy in fewer training rounds.Secondly,in order to solve the problem that the traditional DDPG algorithm is difficult to achieve algorithm convergence in a relatively short time in a relatively complex environment,DDPG algorithm combined guiding experience is proposed.This algorithm can obtain a small amount of prior knowledge by designing a simple controller for some actions as an early guide,and pre-train the network,so that the state-action search can focus on the high-value state transition process.By dividing the experience replay buffer into two parts,the replay probability of guiding experience can be increased in the early stage of training,thereby improving the training speed and speeding up the algorithm convergence.Simulation experiments show that the algorithm can effectively complete the task with the help of prior knowledge,and has a good training effect.Finally,in view of the problem that the agent is in a negative reward state for a long time and cannot learn a good strategy,a DDPG algorithm based on Random Network Distillation(RND)is proposed to drive the agent to explore the environment and avoid the vehicle stopping.The convergence of the algorithm can be accelerated by introducing curriculum learning,and the double-critic network structure is used to solve the problem of overestimation bias in the DDPG algorithm.The simulation experiment is completed and it is proved that the algorithm can effectively complete the task under the drive of curiosity,and has a good training effect. |