Researches On Deep Hierarchical Reinforcement Learning Based On Option-Critic Framework

Posted on:2024-01-12

Degree:Master

Type:Thesis

Country:China

Candidate:J W Li

Full Text:PDF

GTID:2568306941963989

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Deep reinforcement learning combines the representational power of deep neural networks and the decision-making power of reinforcement learning,and has been widely used in various fields with good performance.However,deep reinforcement learning is difficult to achieve better performance when faced with more complex task scenarios.Deep hierarchical reinforcement learning combines the idea of divide and conquer,by decomposing large-scale tasks into small-scale subtasks and solving them separately,it can effectively solve the "Curse of Dimensionality" problem and the sparse reward problem,which are difficult to be handled by traditional reinforcement learning.Option-Critic framework is a mainstream framework in deep hierarchical reinforcement learning researches,which can achieve end-to-end learning of internal policies and termination functions through policy gradient theory.However,the OptionCritic framework suffers from degradation problems during the policy learning process,such as similarity of Option sets,low knowledge transfer ability of lower-level policies and limited exploration ability of the agent.To address these problems,research is carried out in the direction of Option diversity,policy transfer and guaranteeing exploration by optimizing truncation parameters,and the related research includes the following three aspects:i.Option-Critic framework in the process of policy learning,the set of Option will tend to be similar,an Option-Critic Algorithm with Mutual Information Optimization(MIOC)is proposed to address the problem.MIOC encourages different Options to take different actions in the same state by introducing the mutual information knowledge between Options and actions as internal rewards,which can ensure the diversity among Options.The method is verified to ensure the diversity of options and improve the experimental performance through comparative experiments in multiple sets of consecutive environments.ii.Using internal driver to guarantee Option diversity can lead to slow learning of algorithms and low knowledge transferability of policies.To address this problem,a Diversity-Enriched Option-Critic Algorithm with Interest Functions Optimization(DEOCIF)is proposed.The DEOC-IF algorithm,by introducing interest functions to limit the selectivity of the upper-level policy in choosing the lower-level policy,ensures the diversity of the Option set,but also enables the learned internal policies to focus on different regions of the state space,which is conducive to improving the knowledge transfer ability of the algorithm and accelerating the learning speed.Experimental results show that the algorithm is effective.iii.Using fixed truncation parameters can lead to a lack of exploration of the agent at the early stage of policy training and affect the experimental results.To address this problem,the Proximal Policy Option-Critic Algorithm Base on Optimized Clipping Parameter(OCP)is proposed.OCP algorithm introduces two decaying forms of truncation parameters to constrain the update of the lower layer policy,which can ensure that the agent has a certain exploration ability in the initial stage of policy training and the stability of policy update at the end of policy training.Comparative experiments are conducted in a continuous experimental environment,and the results show that the algorithm has faster learning speed and experimental performance.

Keywords/Search Tags:

Deep Reinforcement Learning, Option-Critic, Mutual Information, Interest Functions, Truncation Parameters

PDF Full Text Request

Related items

1	Option Learning Method Research With Double Actor-Critic Architecture
2	Research On Deep Reinforcement Learning Algorithm Based On Dual-Agent Cooperation
3	Reaearch On Deep Reinforcement Learning Algorithm In Continuous Action On Space
4	Research On Multi-agent System Decision Algorithm Based On Deep Reinforcement Learning
5	Research On Target Tracking Algorithm Based On Deep Learning And Reinforcement Learning
6	Research On Offline Deep Reinforcement Learning Algorithm Based On Truncation Error
7	Exdloratory Action Correction Algorithm Based On Actor-Critic
8	Research On Deep Reinforcement Learning Methods For Autonomous Grasping Control Of Robots
9	Research On Target Tracking Via Multiple-feature Based On Deep Reinforcement Learning
10	Policy Adaptation With Contrastive Learning And Mutual Information In Meta Reinforcement Learning