Research On Approximate Programming Methods In Partially Observable Markov Decision Problems

Posted on:2018-02-01

Degree:Master

Type:Thesis

Country:China

Candidate:W S Qian

Full Text:PDF

GTID:2310330542465278

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Planning under uncertain and dynamic environments is an essential capability for autonomous robots.Partially observable Markov decision processes(POMDPs)provide a rich mathematical framework for solving such problems,and have been applied to different robotic tasks such as self-driving car navigation and manipulation with robot hands.While there is dramatic progress in solving discrete POMDPs,works on continuous POMDPs have been limited.For the shortcomings of current algorithms,this paper presents three novel ones:i.In order to address the inefficient a priori discretization of the continuous-state space as a grid,this paper presents a novel efficient algorithm for continuous POMDPs,named GPG.GPG samples both a robot’s state space and the corresponding belief space.At the same time,GPG deals with the problems in continuous action and observation spaces using a sampled max operator and generalized policy graphs.Preliminary experimental results indicate that GPG is a promising new approach for robot motion planning under uncertainty.ii.In order to address the size of a policy graph in Monte Carlo value iteration grows over time for continuous-state POMDPs,which drastically reduces the performance of the algorithm,this paper presents an Optimized Monte Carlo Value Iteration(OMCVI).OMCVI optimizes the addition of nodes and prunes the dominated or redundant nodes.It constructs more compact policy graphs with comparable qualities.iii.In order to address the inefficiency at heuristic search stage for traditional algorithms for continuous-state and large observation POMDPs,this paper presents a novel approach,called Gingko Leaf Search(GLS).In the forward exploration phase of traditional algorithms,only the outcome that has the highest potential impact is searched.GLS allows the selection of more than one outcome when their potential impacts are close to the highest one.At the same time,it adaptively adjusts the number of the selected outcomes.Compared with the traditional algorithms,GLS can save considerable time to propagate the bound improvement of beliefs in deep levels of the search tree to the root belief because of fewer point-based value backups.Then we give the proof of its convergence.Experiments show that GLS owns faster convergence rate and better performance.

Keywords/Search Tags:

reinforcement learning, POMDP, continuous, prune, heuristic search

PDF Full Text Request

Related items

1	Partial Observation Of Memory-based Reinforcement Learning Problems In Markov Decision Process
2	SgRNA Activity Prediction Method Based On Reinforcement Learning
3	Research And Application Of Incomplete Information Game Algorithm Based On Reinforcement Learning And Game Tree Search
4	The Study And Application Of Distributional Reinforcement Learning Based Reliable Decision Making Methods
5	Research On Structural Learning Based On Heuristic Search In Bayesian Networks
6	Research On Information Dissemination Control Based On Reinforcement Learning
7	Research On Community Detection And Graph Embedding Problems Based On Evolutionary Computation And Meta-heuristic Search
8	Research And Realization Of Complete Information Game Theory Based On Reinforcement Learning
9	The High-Efficient Heuristic Algorithm Design For AP3 Problem
10	Research On Multi-level Inverted Pendulum Balance Control Based On Deep Reinforcement Learning