Font Size: a A A

Research On Value Reinforcement Learning Based On Generalized Fixed Points

Posted on:2024-02-15Degree:MasterType:Thesis
Country:ChinaCandidate:Y Z LvFull Text:PDF
GTID:2558307136495234Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years,reinforcement learning has become a paradigm for solving sequential problems.In the face of large-scale or continuous problems,a widely used technique for reinforcement learning is value function estimation,the accuracy of which will directly affect the effect of reinforcement learning.Linear value function estimation is one of the methods of value function estimation.However,in practical application,there are some problems in the estimation of linear value function,so the accuracy of linear value function estimation is still a problem to be solved.In order to solve these problems,the fixed point perspective is introduced into linear value function estimation.From this perspective,the problem of linear value function estimation can be transformed into a problem of finding fixed points,so that more accurate value function estimation can be obtained.However,the existing fixed point solutions of reinforcement learning are not optimal.At the same time,the solution of each fixed point has its own defects and deficiencies.What kind of fixed point solution of reinforcement learning is better and how to express and approach the optimal solution are the two main problems that reinforcement learning has to face up to now,and also the problems that this paper intends to solve.In view of this,this paper for the above two problems for in-depth exploration.The main work and contribution of this paper are as follows:1.In order to solve the problem of what kind of reinforcement learning fixed point solution is better,this paper proposes the model design of generalized fixed point solution,which mainly has two contributions,namely the extension of fixed point solution based on n-step bootstrap method and the construction of fixed point solution based on linear interpolation method.At the same time,this idea is applied to mature CBMPI algorithm framework,and CBMPI(n,β)algorithm based on generalized fixed point is proposed.2.Aiming at the problem of how to express and approximate the optimal solution,the parameter optimization of generalized fixed point solution based on Bayesian optimization and higher quality solution based on ensemble learning are proposed,hoping to approximate the optimal solution and find a better sub-optimal solution.3.The effectiveness of our proposed algorithm is verified in the classic Tetris game environment.And we compared with the method recorded in the literature.
Keywords/Search Tags:Linear value function estimation, Fixed point, Bayesian optimization, Integrated learning, The game Tetris
PDF Full Text Request
Related items