Today,with the continuous progress of artificial intelligence,automobiles have also entered the era of intelligence,and many well-known automobile manufacturers in the world have invested heavily in the field of autonomous driving technology.As a product of the information technology revolution,autonomous driving technology plays an important role in promoting sustainable development of the environment,improving urban traffic and driving safety.At present,the common automatic driving solutions are the integration of perception-decision-control and other sub-modules.There are many difficulties in this model: 1.The rule-based strategy requires a lot of manual design,which is not only complicated and cumbersome,but also extremely expensive.2.It cannot adapt to the densely populated and complicated urban traffic environment;3.The lower module is closely connected with the upper module,and the maintenance of the system is cumbersome and complicated.In view of these difficulties,this paper applies the Carla urban driving simulation simulator to conduct a simulation experimental study on the end-to-end control of intelligent driving through deep reinforcement learning:First of all,this paper uses the Deep Deterministic Policy Gradient(DDPG)algorithm,which is free of manual parameter tuning,model-free,and capable of adaptive learning.This algorithm relies on the actor-critic framework,which not only overcomes the problem of high-dimensional input,but also The problem of continuous action output has also been solved.In this paper,the DDPG algorithm is first used to complete the simulation training for the CG1 and CG2 tracks in the Torcs simulation environment.The results show that the DDPG algorithm can train a good control strategy for driverless cars,and the feasibility of applying the DDPG algorithm to intelligent vehicle control is verified.Then,in order to further restore the real driving scene,the DDPG code based on the Carla simulation environment is written.The code is written in three parts: environment,neural network,and agent.Gym in Open AI provides almost standard environment code writing rules.The environment in this article adopts the same rules as Gym.The environment writing is divided into four methods: reset,step,render,and reward.The neural network part follows the original framework of DDPG and is divided into two networks,actor and critic.The code of the agent part is the core part of the DDPG algorithm.The training of the whole algorithm is mainly completed by the train method.The agent of DDPG interacts with the environment,trains in a trial-and-error manner,and adjusts and improves the driving strategy in each step of the training.until the optimal control strategy is found.Second,reinforcement learning algorithms require random trial and error during training,and the behavior of trial and error is too costly for the car to drive.Therefore,in view of the fact that reinforcement learning requires trial and error,based on the DDPG algorithm,a real-time monitoring of dangerous behaviors of cars is designed between the environment and the agent,which can constrain and correct the dangerous actions of the agent.In this way,the purpose of reducing trial-and-error behavior and improving training efficiency is achieved.The DDPG algorithm and the supervised DDPG algorithm were trained for 70,000 episodes respectively in the Carla simulation environment.The simulation results showed that the DDPG algorithm and the supervised DDPG algorithm finally achieved the same training effect,and both could effectively avoid obstacles.Under normal driving,but the supervised DDPG algorithm converges faster than DDPG.Secondly,with the map,the number of dynamic factors,and the weather as control variables,the unified evaluation scheme of the two algorithm models in the experimental platform was evaluated by the lane keeping task.Finally,the supervised DDPG was used in the environment without dynamic factors and in the environment with dynamic factors.The average task completion is 98% and 89%,and the DDPG task completion is 97% and 88%,respectively.Compared with the lane keeping task of the official Carla paper in 2017,the average task completion is greatly improved.End-to-end control of self-driving cars with deep reinforcement learning algorithms not only effectively improves the serious drawbacks of traditional solutions that rely on upper and lower modules,but also shortens the development cycle.Supervised reinforcement learning significantly improves the convergence speed and effectively reduces the trial and error frequency of the agent in the early stage.Therefore,the combination of supervised learning and reinforcement learning can provide a new solution for reducing the risk of trial and error in reinforcement learning,and provide a certain reference value for the realization of end-to-end intelligent driving of deep reinforcement learning from simulation environment to practical application. |