Font Size: a A A

Research On Reinforcement Learning Methods Based On Weighted Double Mechanisms

Posted on:2020-03-01Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y PanFull Text:PDF
GTID:2428330578477967Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In reinforcement learning,the agent interacts with the environment and learns from the evaluation feedback given by the environment at the same time.Reinforcement learning is very similar to the way humans learn new things,so reinforcement learning is considered to be one of the important approaches to artificial general intelligence.This paper proposes weighted double mechanisms for reinforcement learning,which is studied and analyzed in tabular reinforcement learning and deep reinforcement learning.The main research content can be summarized as the following three parts:(1)When the classical tabular Q-learning algorithm uses the maximum estimator to esti-mate the maximum expected value,there inevitably is a positive bias.And the bias is accumulated with the bootstrap process,leading to a decline of the policy performance.The double Q-learning algorithm utilizes the double estimator.The double estimator ef-fectively suppresses the positive bias,but it causes a negative bias and decreases the per-formance of the policy.Theoretically,the unbiased estimation of the maximum expected value must be between the maximum estimator and the double estimator.Therefore,by using the weighted double mechanism,this paper introduces a weighted double estima-tor to estimate the maximum expected value.Through analyzing the influence of some important factors on the weight in multi-armed bandits problems,a heuristic function of the weight is determined.The weighted double Q-learning algorithm derived from the weighted double estimator can evaluate the value function more accurately,and improve the performance of the policy as well.(2)In deep reinforcement learning,the experience replay mechanism not only breaks the correlations between data,making the data meet the independent identically distributed hypothesis,but also enhances the stability of the algorithm.However,when the experi-ence buffer is full,the First-In-First-Out retention strategy replaces the oldest experience with a new experience.Once these replaced experiences cannot be recovered,the neu-ral network gradually forgets the knowledge of those experiences over time.To solve this problem,a double experience buffer framework is designed by using the weighted double mechanism.In this framework,the two buffers maintain different state distribu-tions of the state space and the adaptive sampling ratio further improves the efficiency of the data.In both discrete and continuous action problems,the double experience buffer framework can effectively alleviate the forgetting problem and improve the generaliza-tion ability of the policy.(3)The value iteration network approximates the value iteration process by the recurrent convolutional network,so that the policy it generates has a certain generalization abil-ity.But in the value iteration network,all states are updated equivalently by sweeping the state space regardless of their significance,resulting in lower update efficiency.In order to solve this problem,the idea of weighted double mechanism is used to combine the two updating methods of value iteration and asynchronous update in the planning module,and the asynchronous value iteration network is proposed.The new network can effectively reduces the number of state updates and performs superior generalization ability in larger and more complex navigation problems.
Keywords/Search Tags:Weighted Double Mechanism, Reinforcement Learning, Deep Reinforcement Learning, Experience Replay Mechanism, Value Iteration Network
PDF Full Text Request
Related items