Font Size: a A A

Generalization Improvement Of Reinforcement Learning Based On Adversarial Discriminative Feature Separation Network

Posted on:2024-04-05Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiuFull Text:PDF
GTID:2568307052995919Subject:Electronic information
Abstract/Summary:PDF Full Text Request
Deep neural networks(DNNs)have demonstrated powerful feature extraction and fitting capabilities in many tasks,and taking advantage of DNNs,Deep Reinforcement Learning(DRL)has shown good performance in several fields.However,normal deep reinforcement learning tasks share the same training and testing scenarios,which makes deep reinforcement learning agents highly susceptible to overfitting in the training scenarios.Since using DNNs to extract state representations in reinforcement learning tasks,DNNs will prefer to extract representations that make the agent perform good actions in the training scenario,which results in the state representations carrying some task-irrelevant information,so the policy learned by the agent often does not have good generalization ability.In particular,for the generalization from multiple training scenarios to multiple testing scenarios,it is difficult for the existing algorithms that reduce the overfitting of DRL models to extract common state representations in different scenarios.In order to solve the above problems and enable DRL intelligences to learn transferable or generalizable policy in different scenarios,this article proposes the Adversarial Discriminative Feature Separation(ADFS)algorithm.The method aims to better extract common state representations in different scenes under the same task from the perspective of improving the representational capability of DNNs and improving the RL algorithm.For this goal,the algorithm proposes three sub-modules for combined training,which are mainly as follows.(1)An algorithm to mitigate model overfitting from the perspective of decoupling shared feature extractors is proposed,which uses a dominance function to replace the value function to decouple the feature coupling of the value function and the policy function,based on the value of the advantage function that can measure the action in reinforcement learning.(2)A method based on adversarial training is proposed,and a new adversarial discriminator is designed to assist the training of the state encoder,enabling itto extract a common state representation of the confusion discriminator in multiple training scenarios.(3)A feature separation method is proposed to learn the scenario discriminative representation of the state by a two-stage approach,so that it and the common representation of the scenarios can jointly perform the task of training the adversarial discriminator and further correct the discriminative direction of the discriminator.In this article,the performance of the proposed algorithm is tested in the publicly available Open AI Procgen and Car Racing training environments,the generalization ability of the model and the generalization gap are tested in 16 game environments using the proposed method ADFS,and finally,the analysis of the experimental results and the ablation experiments of each module of the algorithm are presented.The experimental results show that the algorithm ADFS achieves better generalization performance in most of the 16 game environments compared to the benchmark algorithms in the field and the best performing algorithms in recent years using the same benchmark environments.
Keywords/Search Tags:Deep reinforcement learning, Generalization, Advantage function, Adversarial training, Discriminative features
PDF Full Text Request
Related items