Option Learning Method Research With Double Actor-Critic Architecture

Posted on:2024-05-04

Degree:Master

Type:Thesis

Country:China

Candidate:Y P Xu

Full Text:PDF

GTID:2568306941964059

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Hierarchical reinforcement learning is an important branch of reinforcement learning,which is mainly used to solve complex problems such as long time sequence decision making and sparse reward.It is one of the research hotspots in the field of reinforcement learning at present.As the main research direction of hierarchical reinforcement learning,option learning method has formed a powerful hierarchical decision-making ability through temporal abstraction technology,and has achieved good results in many fields of real society,such as automatic driving,recommendation system,natural language processing,game,etc.In the option learning algorithm,making each option have a certain problem-solving ability will help the agent obtain higher environmental rewards through rational use of options in complex tasks.The introduction of the double actor-critic architecture is an effective way to activate all options equally.This paper focuses on the problems of low sample efficiency,unstable policy and low training rate of the option learning algorithm based on the double actor-critic architecture,mainly including the following three aspects:ⅰ.In the original double actor-critic architecture,only the on-policy learning method was adapted,resulting in low sample efficiency in option learning methods.To address this issue,a double actor-critic option learning method based on off-policy updates is proposed.This method improves sample efficiency by introducing off-policy and random nonsequential sampling training methods,and increases information constraints to further improve structural stability.Multiple sets of experimental results show that this method significantly improves sample efficiency while ensuring hierarchical structural stability when compared to other methods.ⅱ.Currently,option learning methods only consider state information when guiding low-level actions under the guidance of high-level policies,resulting in insufficient use of option information and unstable low-level policies.To address this issue,a double actorcritic option learning method based on trajectory information is proposed.This method uses diferent types of information in option trajectories for guidance,introduces option similarity measurement as internal reward to improve the stability of low-level policies.Multiple sets of experimental results show that this algorithm has better stability.ⅲ.Traditional experience replay cannot effectively utilize excellent experiences in the experience buffer,which reduces the training efficiency of the algorithm.In addition,traditional methods consider only one aspect of excellent experience and do not fully reflect the option combination ability of hierarchical structures.To address this issue,an option learning method based on superior experience is proposed.This method classifies option advantage information and controls the sampling ratio of superior experience to improve the utilization rate of excellent experience in historical samples and accelerate the training process of agents.Multiple sets of experimental results show that this method can speed up the training rate of agents and has high performance.

Keywords/Search Tags:

hierarchical reinforcement learning, double actor-critic architecture, off-policy, trajectory information, superior experience

PDF Full Text Request

Related items

1	Exdloratory Action Correction Algorithm Based On Actor-Critic
2	Robust Policy Gadient Algorithm Based On Actor-Critic In Deep Reinforcement Learning
3	Research On Policy-Constrained Reinforcement Learning
4	Research On Approximate Reinforcement Learning In Continuous Space
5	Research On Multiagent Cooperation And Applications Based On Reinforcement Learning
6	Research On Fast Policy Gradient Algorithms Of Reinforcement Learning Based On Adaptive Learning Rate
7	Deep Reinforcement Learning With Experience Replay
8	Researches On Improvement Of Fixed Temperature Soft Actor Critic Algorithm
9	Position Control Of Hyper-Redundant Continuum Robot Based On Soft Actor-Critic Algorithm
10	Research On Multi-agent System Decision Algorithm Based On Deep Reinforcement Learning