| With the development of technology and the guidance of national policy support,the development of driverless cars has become a major trend.Unmanned vehicles can meet people’s demand for higher driving safety,optimize the allocation of resources,space and time,and reduce the impact of pollutant emissions on the environment.Path planning is the bridge between information perception and intelligent control of unmanned vehicles,and is one of the key technologies to realize vehicle intelligence.Its main goal is to plan a safe and reliable driving path from the starting point to the desired end point by satisfying all constraints in a complex and variable environment with dynamic obstacles.This paper firstly analyzes the autonomous driving technology module,constructs a mathematical model of the vehicle based on its kinematic and dynamic constraints,and compares the advantages and disadvantages of each Deep Reinforcement Learning(DRL)algorithm;secondly,uses the discrete DRL algorithm for path planning of unmanned vehicles,and studies and analyzes the Deep Q-Network(DQN)algorithm,which is an algorithm that can be used to plan the path of an unmanned vehicle.For its overfitting problem,choose the Dueling DQN algorithm to join the competitive evaluation network for vehicle path planning,in order to solve the problem of low sampling efficiency of Dueling DQN algorithm and learning rate is not easy to set,proposed the MOD-Dueling DQN algorithm using weight empirical replay mechanism and W-Adam optimizer After that,the research analyzes the end-to-end intelligent driving model,constructs a global planner,a behavioral planner including vehicle tracking control and lane change control,and a local planner based on the Lattice algorithm implemented in the Frenet coordinate system in the CARLA environment.Finally,the continuous DRL algorithm is used to realize the deep path planning of unmanned vehicles.Deterministic Policy Gradient Algorithms(DDPG)is chosen to directly implement end-to-end deep path planning without performing any discretization operation during the algorithm execution,and the problems of low sampling efficiency due to average sampling,poor exploration ability due to deterministic policy gradient and difficulty in setting learning rate are improved for the DDPG algorithm.The MOD-DDPG algorithm with the introduction of priority weight empirical replay mechanism,OU noise and W-Adam adaptive learning rate optimizer is designed,and the practicality of the algorithm is illustrated by the comparative validation and result analysis under CARLA simulator. |