Font Size: a A A

Research And Application Of Reward Shaping Based Reinforcement Learning

Posted on:2023-08-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y L DongFull Text:PDF
GTID:1528307043467024Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Reinforcement learning is one of the methodologies of machine learning,which maximizes the rewards by learning value functions or policies to achieve specific goals during the interaction between an agent and its environment.Reinforcement learning has been widely used in various aspects such as robotics,Go,and game intelligence.However,there are many problems in reinforcement learning,such as the design of reward functions,long training time,and unstable training process.In this dissertation,aiming at the low training efficiency and low stability in reinforcement learning algorithm,we conduct a study based on reward shaping from the perspective of reward function and propose a theoretical framework based on reward shaping to optimize the training process.This framework can shape the the original reward function,which can be realized in the subsequent reinforcement learning algorithm,taking into account both the training efficiency and training stability,so as to guide and optimize the training process of reinforcement learning in theory and practice,then significantly improve the training efficiency and stability.The main contributions of this dissertation are as follows:To address the problem of difficult selection and determination of the potential function in reward shaping,this dissertation converts the theoretical framework of reinforcement learning into a control optimization problem,and derives a theoretically guaranteed method for determining the potential function from the perspective of Lyapunov stability analysis,so as to improve the learning efficiency of reinforcement learning and speed up the training process.The convergence proof of the proposed method is also given in this dissertation by stochastic approximation theory.Further,this dissertation also verifies that the proposed method can substantially improve the training efficiency in 3 discrete action environments as well as 3 continuous action environment.To address the problem of frequent reward oscillations during reinforcement learning training,this dissertation investigates the statistical properties of reward trajectories and proposes a smoothing reward shaping method that theoretically reduces the variance of reward trajectories and derives its theoretical guarantees by stochastic approximation theory,thus improving the learning stability of reinforcement learning and significantly increasing the applicability.The proposed method is also verified to improve the training stability on3 algorithms and 4 continuous control benchmarks.In order to take into account both the training efficiency and training stability,this dissertation identifies the optimal hyperparameter combination by means of the expectation maximum algorithm to identify the hidden variables,reduces the cost of searching the hyper-parameters,and achieves a balance between learning efficiency and stability by means of reward fusion,so as to achieve optimization of the reinforcement learning training process.In this dissertation,we verify that the proposed optimization architecture can significantly improve its training effect on various benchmarks,and the computational complexity is only 2% of the grid search method.In order to verify the effectiveness of the proposed framework,we conduct a study on the manipulation of an 8 Degree-of-Freedom dexterous robotic hand based on the proposed optimization of the reinforcement learning training process implemented earlier.The training system is established in Pybullet,and the strategies trained in the simulation environment are transferred to the real manipulator using domain randomization to achieve the valve continous rotating task.The practical application well verifies the effectiveness of the proposed framework and its important demonstration significance for strengthening the large-scale application of reinforcement learning.This dissertation summarizes the full text.It is clarified that the proposed theoretical framework can shape the original reward function,so that the shaped reward function can be implemented in the subsequent reinforcement learning algorithm with both training efficiency and training stability,thus significantly improving the practicality and applicability of the reinforcement learning algorithm.And this dissertation looks forward to how to deeply mine and analyze the influence of reward function on reinforcement learning algorithm,so as to establish a more general theoretical framework.
Keywords/Search Tags:Reinforcement Learning, Reward Shaping, Stochastic Approximation, Expectation Maximum, Dexterous Robotic Hand Manipulation
PDF Full Text Request
Related items