Intelligent driving is one of the hot application scenarios in artificial intelligence technology research,and path planning technology plays a key role in promoting the development of intelligent driving.Path planning is for the agent to plan a safe and barrier-free optimal path from the starting point to the target point in an unknown environment.The traditional path planning algorithm has its advantages,but there are also many shortcomings,and the ability to adapt to unknown complex environments is somewhat poor.Deep reinforcement learning uses neural network structures to collect current environmental state information,which can help agents achieve environmental planning.However,there are also some problems in the path planning deep reinforcement learning algorithm,such as slow convergence speed and overestimation,in order to improve the performance of the algorithm,it can show more superior effects in the path planning of agents.This paper learns the cutting-edge knowledge related to research,studies and analyzes the problems that arise in path planning and improves,the main research content is as follows:(1)This paper proposes the design of an adaptive exploration strategy,in order to solve the problem that it is difficult to balance exploration and utilization when selecting the actions of agents.If the agent only focuses on exploring new environmental information when learning,the problem of slow convergence speed and low learning efficiency will occur;However,if the agent makes too much use of existing empirical data,it is prone to the problem of local optimization.The mechanism is to establish a mapping relationship between the reward value and the parameters of the ε-greedy strategy,and adjust the strength of exploration through this connection.In other words,the agent can adaptively adjust the selection probability of exploration or utilization according to the learning situation,so as to accelerate the convergence speed of the algorithm.(2)In this paper,a design of the objective function is proposed to solve the overestimation problem of the DQN algorithm.When the path planning of the DQN algorithm is carried out,each time the network is updated,the optimal action of maximizing the target Q network will be taken.And because there is a certain error between the true value and the estimated value of the network output,when the error is evenly distributed,the agent may not be able to select the optimal action,which leads to the problem of overestimation.The mechanism proposed in this paper is to add a correction function that changes according to the change of Q value to its objective value function,so as to highlight the gap between the optimal value and the suboptimal value,so that the optimal value is easier to identify.(3)Performance analysis of algorithm validation experiments.In order to verify the performance of the improved mechanism,this paper first uses the Gym game environment model provided by the Open AI platform to conduct pre-training experiments on the proposed network structure algorithm model to improve the accuracy of the algorithm model.Then,through Python and Tkinter modules,a raster map environment of different sizes and multiple target points is established,the original DQN algorithm and the improved new-DQN algorithm are experimented,and the superiority of the new-DQN algorithm is verified by analyzing the obtained data results. |