Font Size: a A A

Research On Autonomous Trajectory Planning Of Manipulator Based On Deep Reinforcement Learning

Posted on:2023-09-16Degree:MasterType:Thesis
Country:ChinaCandidate:H Y ChiFull Text:PDF
GTID:2568307127982929Subject:Electrical engineering
Abstract/Summary:PDF Full Text Request
With the development of the times and the progress of society,the demand for manipulator is also increasing.Trajectory planning algorithm is an important research direction of manipulator,but the traditional trajectory planning algorithm is highly dependent on the environment.The algorithm should plan the trajectory according to the environment in advance,so as to realize the movement of manipulator.Therefore,the traditional trajectory planning algorithm cannot adapt to the unknown environment.Thanks to the rapid development of power electronics,computer and other related technologies,deep reinforcement learning algorithm has become an important research direction in the field of artificial intelligence,and some research progress has been made in the trajectory planning of manipulator.Therefore,this paper will study the use of deep reinforcement learning algorithm to realize the end-to-end traj ectory planning process of manipulator.Firstly,the D-H parameter method of the manipulator is analyzed,and then the kinematics expression of the manipulator is derived.Then,the FetchReach robot simulation environment based on XML data format is introduced.In order to successfully run the simulation environment,a simulation platform based on physical engine MuJoCo is built.At the same time,for the implementation of the deep reinforcement learning algorithm based on PyTorch framework,the third-party module Gym is introduced,and the compiler is implemented on PyCharm.Secondly,aiming at the problem that traditional algorithms such as DQN(Deep QLearning)and SARSA(State-Action-Reward-State-Action)cannot be used in continuous space,DDPG(Deep Deterministic Policy Gradient)is algorithm that used to realize the autonomous trajectory planning of manipulator.Under the condition of the initial DDPG algorithm,due to the singleness of the binary reward function and the low sampling efficiency resulting in the low training efficiency of the manipulator.Therefore,the real-time partition reward function is designed,and the prioritized experience replay mechanism is adopted.The real-time partition reward function realizes the dynamic acquisition of reward value according to the principles of real-time and partition,and solves the problem of poor adaptability of binary reward function to the environment.According to the experiment,the manipulator can achieve 100%success rate of trajectory planning after about 250 epochs of training.The prioritized experience replay mechanism gives each sample the corresponding sampling probability to reflect the training value of the sample,which overcome the problem of low sampling efficiency of the uniform experience replay mechanism,so as to further improve the training efficiency.The experimental results show that about 190 epochs of training can achieve 100%success rate.Finally,the improved DDPG algorithm has the problems of overestimation bias,high variance and sparse reward,which leads to low sample utilization.To solve the above problems,a TD3(Twin Delayed Deep Deterministic Policy Gradient Algorithm)algorithm based on HER(Hindsight Experience Replay)algorithm is adopted.TD3 algorithm mainly optimizes the overestimation bias and high variance of DDPG algorithm.HER algorithm adopts the idea of learning from failure to solve the problem of sparse reward,so as to improve the utilization of samples.The experiment shows that the trajectory planning action of the manipulator reaches 100%success rate after only about 50 epochs of training,and the performance of the algorithm is improved by 23%.Therefore,it lays a foundation for subsequent migration to the physical platform.
Keywords/Search Tags:Deep Reinforcement Learning, Manipulator, Trajectory Planning, Reward Function
PDF Full Text Request
Related items