Autonomous driving vehicles integrate advanced technologies such as sensors,network communication,information security,a perception module,a decision and planning module,and an action execution module.The decision and planning module is equivalent to a human brain,which is the core part of the autonomous driving system.With the rapid development of AI technology in recent years,reinforcement learning has gradually emerged and has become one of the mainstream methods leading to an intelligent future.In this thesis,we will use reinforcement learning to design a decision-making plan for lane-keeping and verify the feasibility and effectiveness of the plan in a simulation environment.First of all,a typical autonomous driving system is studied,and the specific tasks that each module is responsible for driving the vehicle are discussed in this thesis.Also,the basic knowledge of PID controller and deep learning,the critical theory of reinforcement learning,is introduced.Besides,an in-depth discussion of the DQN(Deep Q-Learning Network)algorithm is made,one deep reinforcement learning algorithm.Secondly,the end-to-end decision-making framework based on DQN is designed,and simulation experiments of lane-keeping on the CARLA platform are carried out in this thesis.A general state-space during the experiment and the reward function form of the environment with different ideas are meticulously designed.By comparing the specific performance of the DQN Agent in training and testing under the two reward functions,we can determine the reward function form that can more correctly guide the agent’s self-improvement and prove the critical position of the reward function in reinforcement learning.Furthermore,based on the agent’s decision-making performance under the DQN algorithm,the TGDQN(two-stream GRU DQN)algorithm is proposed in this thesis,an improved version of DQN.In detail,we improve how the DQN algorithm replays experience to increase the utilization of the experience buffer data.Also,we change the network architecture of the algorithm into two parallel streams that distinguish between the partial rewards from the environment state and the partial rewards for performing actions in the immediate rewards.Besides,by analyzing the problem of partial observation of the state of the MDP(Markov Decision Processes)for autonomous driving,we use the recurrent neural network GRU(Gated Recurrent Unit)to improve it.Finally,the end-to-end decision-making framework based on TGDQN and the combined decision-making framework of TGDQN and PID are designed,and simulation experiments are performed in this thesis.By comparing the three decision-making methods of DQN,TGDQN,and the combined,the agent’s episode reward changes in the training process,and the average completion of tasks in the test process show that the decision-making ability under the TGDQN algorithm is significantly better than DQN.Besides,the level of the combined decision-making is second only to TGDQN,indicating that the combination of reinforcement learning and traditional control methods is feasible.There are 48 figures,18 tables,and 66 references in this thesis. |