Partial Observation Of Memory-based Reinforcement Learning Problems In Markov Decision Process

Posted on:2018-09-11

Degree:Master

Type:Thesis

Country:China

Candidate:J J Song

Full Text:PDF

GTID:2350330515499319

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

In reinforcement learning,Agent makes actions to the environment and gets reward,different actions gain different rewards value given by environment.Agent can learn the mapping from internal state to action,through constantly strengthen the rewards value of a series of actions which reached the target point,that is the decision making process.The traditional U-Tree algorithm has achieved remarkable results in solving the POMDP problem,however,because of the random growth of the fringe nodes,there are still many problems such as the high computation complexity,the large scale of the tree and the large memory requirement.This paper makes improvement based on the original U-Tree algorithm,classifies the examples which do the same action of a node through obtaining the observed records of the next step,and proposes an EIU-Tree algorithm which extends fringe node based on effective instances,it greatly reduces the computational scale,which can help the agent to learn faster and better.The simulation experiments are carried out in the 4�3 grid problems,and the experiments show that the algorithm has a better result compared with the original U-Tree algorithm.Aiming at the problem of slow convergence of MU-Tree algorithm,when agent does value iteration in this paper,we use Sarsa(?)algorithm to update the value of Q,and proposes an algorithm based on Sarsa(?),when agent reaches the goal state or punish state,SU-Tree algorithm will conduct an updation of the value of Q for all the instances on this path,to improve the convergence speed of the algorithm.The simulation experiments are carried out in the classic 4�3 grid problems,and the experiments show that the algorithm compared with the original U-Tree algorithm and MU-Tree algorithm,Agent can quickly find the no oscillation path from starting to the end.

Keywords/Search Tags:

Reinforcement Learning, U-Tree, Sarsa(?)algorithm, Q-learning algorithm, POMDP

PDF Full Text Request

Related items

1	Research And Application Of Imperfect Game Strategy Based On UCT Algorithm And Deep Reinforcement Learning
2	Research On Intelligent Decision Model Based On Deep Reinforcement Learning
3	Assisting Decision-making Optimization Method And System For BPPV Treatment Based On Reinforcement Learning
4	Research And Application Of Incomplete Information Game Algorithm Based On Reinforcement Learning And Game Tree Search
5	Research On Monitoring Method Of Multi-scale Cyclone Based On Deep Reinforcement Learning Algorithm
6	Hyperparameter Optimization Of Identification Algorithm Based On Reinforcement Learning
7	Research On Path Planning Based Onreinforcement Learning
8	Research And Realization Of Game Strategy Based On Deep Reinforcement Learning
9	Research On Underwater Robot Navigation Algorithm Based On Deep Reinforcement Learning
10	SgRNA Activity Prediction Method Based On Reinforcement Learning