Font Size: a A A

Research On Reinforcement Learning Algorithms For Complex Problems

Posted on:2022-05-05Degree:MasterType:Thesis
Country:ChinaCandidate:F Y LiuFull Text:PDF
GTID:2518306323462434Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Reinforcement learning is a type of machine learning method that is recognized as one of the key techniques to achieve artificial general intelligence.With the increasing complexity of problems in real-world application scenarios,the study of efficient rein-forcement learning algorithms has received increasing attention.On the one hand,in order to solve complex problems,reinforcement learning methods often use deep neural networks as the representation of policies and value functions,resulting in non-convex and non-smooth optimization problems,which make derivative-based reinforcement learning methods easily fall into local optimal solutions,while derivative-free reinforce-ment learning methods can avoid this problem,but their sample efficiency is extremely low when the problem dimension is high.Therefore,how to improve the sample ef-ficiency of derivative-free reinforcement learning methods is worth being studied.On the other hand,many complex real-world problems often contain multiple different,and possibly conflicting objectives.But most reinforcement learning methods assume only a single objective,and the only multi-objective reinforcement learning algorithms cur-rently available have many problems such as poor scalability,time-consuming training,and poor quality of the Pareto set obtained by the algorithm.Therefore,how to effec-tively improve the shortcomings of multi-objective reinforcement learning methods is also worth being studied.The main work of this paper includes:(1)Proposed an ES-based derivative-free reinforcement learning algorithm SGES to solve high-dimensional reinforcement learning problems.This paper aims to address the problem that the gradient estimator in the derivative-free reinforcement learning representative algorithm ES has a high variance leading to its low sample effi-ciency.This paper proposes the SGES algorithm to effectively reduce the variance of the gradient estimator by using the historical gradient estimations constructs the gra-dient subspace and its orthogonal complementary space.This paper demonstrates that the variance of the gradient estimator in the SGES algorithm can be much smaller than that of the gradient estimator in the ES algorithm.Experimental results verify the con-clusion of the theoretical analysis and also show the superiority of the SGES algorithm compared with other algorithms.(2)Proposed a multi-objective reinforcement learning algorithm based on meta-learning,PG-Meta-MORL,to solve multi-objective reinforcement learning problems.This paper proposes the PG-Meta-MORL algorithm,which models the multi-objective reinforcement learning problem as a meta-learning problem.The PG-Meta-MORL algorithm optimizes a meta-policy iteratively using multiple tasks ob-tained from the selection of a fitted prediction model that guides the entire optimization process in the direction that can best improve the quality of the current Pareto set.Exper-imental results show that the PG-Meta-MORL algorithm not only finds a high-quality approximated Pareto set,but also quickly adapts to newly given objective preferences.
Keywords/Search Tags:Reinforcement learning, Derivative-free optimization, Evolution strategies, Multi-objective optimization, Meta-learning
PDF Full Text Request
Related items