Font Size: a A A

Research On Path Planning Algorithm Based On Deep Reinforcement Learning

Posted on:2024-05-26Degree:MasterType:Thesis
Country:ChinaCandidate:J X CaoFull Text:PDF
GTID:2568307055470484Subject:Electronic information
Abstract/Summary:PDF Full Text Request
With the rapid development and increasing popularity of technology,people’s demands for path planning have also been rising.In order to cope with progressively complex environments and solve intricate tasks,researchers have proposed path planning algorithms based on deep reinforcement learning.The Deep Q Network(DQN)algorithm,as a classic algorithm in deep reinforcement learning,has been widely used in the field of path planning.In the research of single-objective path planning,as researchers have improved the way environmental information is processed,the environmental information has become increasingly simplified.However,the simplified environmental information can slow down the convergence speed of complex network structures.Additionally,the fixed reward values in the reward function prevent the network from being updated effectively,resulting in slow convergence.Furthermore,in the field of mandatory waypoint path planning,there are issues with the ERDQN algorithm,such as overestimation of Q-values and inefficient utilization of high-quality data in the experience pool,which also lowers the network’s update speed.To address the aforementioned problems,this thesis specifically investigates the following:(1)To tackle the slow convergence issue caused by complex network structures and fixed reward values in the reward function,this thesis proposes an improved DQN path planning algorithm.The algorithm first incorporates a Multi-Layer Perceptron into the DQN algorithm,enabling the network to be iteratively updated by the Multi-Layer Perceptron and accelerating the convergence speed of the neural network.Secondly,a cubic function is introduced into the reward function,creating a dynamic reward function.Finally,an adaptive action selection mechanism is implemented,which allows objects to utilize the reward function for action selection more frequently in the later stages of path planning,thereby addressing the problem of potentially missing better actions due to a high probability of random action selection.Experimental results demonstrate that the proposed improvements lead to faster network convergence and improved efficiency in path planning.(2)To address the issues of overestimation of Q-values and inefficient utilization of high-quality data in the experience pool,which hinder the network’s update speed in the ERDQN algorithm,this thesis proposes the PER-ERD3 QN algorithm.Firstly,the ERDQN algorithm is combined with Double Q Networks and Dueling Networks to mitigate the problem of overestimated Q-values.Secondly,a prioritized experience replay mechanism is introduced to better utilize the superior data in the experience pool.Experimental results show that the PER-ERD3 QN algorithm achieves higher average scores and average rewards,indicating faster network update speed.
Keywords/Search Tags:Path planning, Deep Q Network, Multi-Layer Perceptron, Dynamic reward function, PER-ERD3QN
PDF Full Text Request
Related items