With the frequent occurrence of traffic accidents,autonomous driving has received a lot of attention from scholars as a technology that can effectively reduce traffic accidents and reduce the labor intensity of drivers.How to efficiently and reliably complete the automatic driving decision control of vehicles has become a hot spot for research in the field of autonomous driving.The End-to-End autonomous driving decision does not depend on the mutual cooperation among the subsystems in the autonomous driving system,but directly outputs the driving decision after sensing the environment through sensors,which reduces the complexity and difficulty of autonomous driving implementation.In this thesis,we propose an End-to-End autonomous driving decision algorithm based on imitation-depth reinforcement learning,which can effectively improve the performance and training efficiency of automotive autonomous driving decision model.The main work of this thesis is as follows.Firstly,networks are espectively built based on the Deep Deterministic Policy Gradient(DDPG)algorithm and the Soft Actor-Critic(SAC)algorithm,and the policy network training for the lane keeping autopilot task is completed in the TORCS autopilot simulation software.Meanwhile,after fully considering the vehicle speed and driving task,the reward function is designed and improved by adding the action change value to the reward function to improve the smoothness of vehicle driving.Through comparison experiments,it is demonstrated that the SAC algorithm is more suitable than the DDPG algorithm for completing autonomous driving decisions.Then,the Behavior Cloning(BC)algorithm in imitation learning is introduced to obtain the strategy network model by directly cloning the expert behavior using the expert strategy network that has been trained to a better effect,which solves the problems of slow convergence and low training efficiency of the SAC strategy model based on visual sensor.At the same time,to address the shortcoming of the small amount of expert teaching data,the acquisition algorithm of expert data is improved to increase the teaching actions of experts in more states,so that the policy network model trained by the BC algorithm has better performance in completing the lane keeping task.Finally,the Data Aggregation(DAgger)algorithm is introduced to improve the BC algorithm,reducing the compound error in the training process.At the same time,the DAgger algorithm is combined with deep reinforcement learning algorithm,and the DAgger-SAC algorithm based on imitation-deep reinforcement learning is proposed,which avoids the reliance of the DAgger algorithm on the manual labeling of the expert to demonstrate the action and saves the labor cost effectively.The training is continued using the DAgger-SAC algorithm on the basis of the policy network obtained by behavioral cloning,and it is demonstrated experimentally that the proposed DAgger-SAC algorithm has a substantially reduced number of interactions with the environment compared with the vision-based SAC algorithm,and obtains a higher cumulative reward value of rounds with fewer training rounds,effectively improving the model training efficiency. |