| With the development of deep learning,deep reinforcement learning has also made a big breakthrough and is widely used in various fields.Conventional reinforcement learning usually assume that the environment is fixed and the agent needs to acquire a large amount of data with that environment in order to adapt it,and the agent’s generalization is usually poor.The agent trained in one environment is difficult to generalize to another environment,even if the environment has only changed slightly.How to improve the generalizability of reinforcement learning and help the agent quickly adapt to new environments has become an important research topic in the field of reinforcement learning.In recent years,researchers have extended meta-learning to reinforcement learning and proposed meta-reinforcement learning to help improve the generalization ability of reinforcement learning.However,these algorithms generally ignore the decoupling of environmental information and policy information,so it is difficult to achieve a better performance in the new environment.In this paper,we propose a Policy Adaptation with Decoupled Representation algorithm(PAnDR)for rapid policy adaptation by combining contrastive learning and mutual information.In the training phase,we train the environment representation network based on offline data using contrastive learning and then train the policy representation and action selection network based on prediction.Next,we use mutual information to decouple the policy representation and environment representation by minimizing the mutual information.While at the same time we using maximizing mutual information to ensure the integrity of policy representation and environment representation.Finally,we train a value function approximation network based on policy representation and environment representation and optimizes,and based on this network to optimize the policy by gradient ascent.Compared with the traditional meta reinforcement learning algorithm,PAn DR is based on offline data set,which saves the interaction cost of reinforcement learning.In the fast adaptation phase,experiments show that PAn DR can quickly adapt to the environment with only a small amount of interaction and can outperform than other existing methods. |