Font Size: a A A

Partial Observation Of Memory-based Reinforcement Learning Problems In Markov Decision Process

Posted on:2018-09-11Degree:MasterType:Thesis
Country:ChinaCandidate:J J SongFull Text:PDF
GTID:2350330515499319Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In reinforcement learning,Agent makes actions to the environment and gets reward,different actions gain different rewards value given by environment.Agent can learn the mapping from internal state to action,through constantly strengthen the rewards value of a series of actions which reached the target point,that is the decision making process.The traditional U-Tree algorithm has achieved remarkable results in solving the POMDP problem,however,because of the random growth of the fringe nodes,there are still many problems such as the high computation complexity,the large scale of the tree and the large memory requirement.This paper makes improvement based on the original U-Tree algorithm,classifies the examples which do the same action of a node through obtaining the observed records of the next step,and proposes an EIU-Tree algorithm which extends fringe node based on effective instances,it greatly reduces the computational scale,which can help the agent to learn faster and better.The simulation experiments are carried out in the 4×3 grid problems,and the experiments show that the algorithm has a better result compared with the original U-Tree algorithm.Aiming at the problem of slow convergence of MU-Tree algorithm,when agent does value iteration in this paper,we use Sarsa(?)algorithm to update the value of Q,and proposes an algorithm based on Sarsa(?),when agent reaches the goal state or punish state,SU-Tree algorithm will conduct an updation of the value of Q for all the instances on this path,to improve the convergence speed of the algorithm.The simulation experiments are carried out in the classic 4×3 grid problems,and the experiments show that the algorithm compared with the original U-Tree algorithm and MU-Tree algorithm,Agent can quickly find the no oscillation path from starting to the end.
Keywords/Search Tags:Reinforcement Learning, U-Tree, Sarsa(?)algorithm, Q-learning algorithm, POMDP
PDF Full Text Request
Related items