Policy Adaptation With Contrastive Learning And Mutual Information In Meta Reinforcement Learning

Posted on:2022-09-24

Degree:Master

Type:Thesis

Country:China

Candidate:T Sang

Full Text:PDF

GTID:2558307154975059

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

With the development of deep learning,deep reinforcement learning has also made a big breakthrough and is widely used in various fields.Conventional reinforcement learning usually assume that the environment is fixed and the agent needs to acquire a large amount of data with that environment in order to adapt it,and the agent’s generalization is usually poor.The agent trained in one environment is difficult to generalize to another environment,even if the environment has only changed slightly.How to improve the generalizability of reinforcement learning and help the agent quickly adapt to new environments has become an important research topic in the field of reinforcement learning.In recent years,researchers have extended meta-learning to reinforcement learning and proposed meta-reinforcement learning to help improve the generalization ability of reinforcement learning.However,these algorithms generally ignore the decoupling of environmental information and policy information,so it is difficult to achieve a better performance in the new environment.In this paper,we propose a Policy Adaptation with Decoupled Representation algorithm(PAnDR)for rapid policy adaptation by combining contrastive learning and mutual information.In the training phase,we train the environment representation network based on offline data using contrastive learning and then train the policy representation and action selection network based on prediction.Next,we use mutual information to decouple the policy representation and environment representation by minimizing the mutual information.While at the same time we using maximizing mutual information to ensure the integrity of policy representation and environment representation.Finally,we train a value function approximation network based on policy representation and environment representation and optimizes,and based on this network to optimize the policy by gradient ascent.Compared with the traditional meta reinforcement learning algorithm,PAn DR is based on offline data set,which saves the interaction cost of reinforcement learning.In the fast adaptation phase,experiments show that PAn DR can quickly adapt to the environment with only a small amount of interaction and can outperform than other existing methods.

Keywords/Search Tags:

Meta Reinforcement Learning, Representation Learning, Decoupling Representation, Maximize Mutual Information, Minimize Mutual Information

PDF Full Text Request

Related items

1	Futures Data Analysis With Mutual Information Continual Graph Representation Learning
2	Research On Deep Graph Representation Learning
3	Towards Local Information Enhanced Representation Learning On Networks
4	Research Of Few-Shot Image Classification Based On Deep Representation Learning
5	Mix-up Consistent Cross Representations For Data-Efficient Reinforcement Learning
6	Research On Robot Motor Skill Learning Methods Based On Reinforcement Learning
7	Polarimetric SAR Image Information Representation And Classification Based On Deep Learning
8	Heterogeneous Information Network Representation Learning Algorithm Based On Transition Probability Matrix Of Meta-path
9	Multi-level Self-supervised Image Representation Learning Based On Attention Feature Fusion
10	Research On Reinforcement Learning Algorithm Based On Parallel Sampling And Behavior Induction