Font Size: a A A

Robot Motion Control Method Based On Particle Swarm Optimization And Meta-reinforcement Learning

Posted on:2023-11-16Degree:MasterType:Thesis
Country:ChinaCandidate:K Y PengFull Text:PDF
GTID:2568306611487074Subject:Engineering
Abstract/Summary:PDF Full Text Request
Robot motion control refers to converting the preset control planning instructions into the required mechanical motion in a complex environment,so as to achieve precise control of mechanical motion,such as control of position,speed,torque,acceleration,etc.In recent years,artificial intelligence has been developing rapidly in robot motion control,and deep reinforcement learning,as an important branch of artificial intelligence,has also been greatly applied.How to apply DRL to robot motion control problems has been a hot research topic in recent years.In DRL,the agent tries to learn to perform a series of actions in an environment to maximize the cumulative reward.However,pure cumulative reward optimization without some mechanism to encourage intelligent exploration may lead to making the agent unable to learn correctly;for sparse reward signal problems,there may be no reward gradient to follow;and DRL methods are sensitive to the choice of their hyperparameters and usually have fragile convergence;also,DRL is overfitted to the environment during training.To address the above problems,the main research of this paper includes the following aspects:(1)Due to the difficulty of dealing with the sparse reward problem,the lack of effective exploration,and the fragile convergence sensitive to the choice of hyperparameters,the DRL algorithm is difficult to apply to large-scale practical problems.Particle Swarm Optimization(PSO)is an evolutionary computational method whose main idea is to find optimal solutions through mutual cooperation and exchange of information among individuals in a population.In this paper,we combine PSO and DRL,DRL trains several policies with the lowest cumulative rewards in the population through multiple data provided by the PSO population,and inserts the policies with improved cumulative rewards into the PSO population each time after training to enhance the information exchange between DRL and PSO,which is called PSO-RL,can enhance the performance and stability of the DRL algorithm.Experiments conducted in a series of challenging continuous control tasks show that PSO-RL not only outperforms traditional deep reinforcement learning algorithms,but also outperforms evolutionary reinforcement learning(ERL)that combines evolutionary algorithm(EA)and DRL.(2)In deep reinforcement learning problems,the agent needs to train the policy parameters from scratch each time and needs to generate a large number of samples to train the neural network by interacting with the environment.thus requiring a large amount of training time.In order to solve this problem,this paper proposes an improved meta-learning algorithm,which embeds the improved particle swarm optimization algorithm into the meta-learning algorithm MAML.which is called PSO-MAML,which not only adaptively learns the policy network parameters,but also adaptively learns the particles weights for group optimization.The MAML algorithm is the initialization parameters of the training model,and then it can quickly adapt to new tasks after several fine-tuning.However,it has the problems of slow exploration speed and difficulty in solving the second derivative during the training process.The improved meta-learning algorithm PSO-MAML proposed in this paper has population-based diversified exploration,which can improve the generalization of the algorithm,and at the same time avoid the calculation of second-order gradients,and can learn more effectively.Simulation results show that the performance and training time of the algorithm are better than ES-MAML,an improved MAML algorithm based on evolutionary strategy,which proves that the algorithm has more effective exploration and better generalization performance.(3)Using PyBullet,a general-purpose environment simulator,as the development environment,we investigate the optimal strategy based on the above two improved algorithms in the robotic arm autonomous operation grasping environment.Since the PyBullet simulator is superior in deep reinforcement learning environment compared to other simulators,this paper uses the KUKA robotic arm grasping object training in PyBullet for simulation testing.By using the state of the environment as the input to the neural network,the robotic arm selects the action based on the magnitude of the probability of various actions output by the neural network and receives an immediate reward,and then the robotic arm improves the policy based on the existing knowledge.The effectiveness of the above two improved algorithms was verified.In the same environment,compared with traditional deep reinforcement learning algorithms,the algorithms proposed in this paper are more suitable for application in this environment,they have a higher success rate,and as the iteration episode increases,the number of steps spent is fewer and fewer.
Keywords/Search Tags:Deep reinforcement learning, Particle swarm optimization, Meta-learning, Motion control
PDF Full Text Request
Related items