Font Size: a A A

Complexity Theory And Planning Algorithms In Partially Observable Markov Decision Procresses

Posted on:2013-07-04Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z C ZhangFull Text:PDF
GTID:1220330395955214Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Partially observable Markov decision processes (POMDPs) provide a general mathematical model for agents in sequential decision making in partially observable stochastic environments. POMDPs have been widely applied in modeling several planning and learning tasks, such as robotic navigation, object grasping, target tracking, human-robot dialogue, and so on. Generally speaking, solving POMDP planning problems exactly in reasonable time is impossible. In the past decade, lots of approximate planning algorithms for POMDPs have been proposed. All of them can be crudely categorized as offline planning algorithms and online planning algorithms.Point-based value iteration algorithms are a group of efficient algorithms among offline planning algorithms. They have achieved great successes in the past decade. We can now handle large-scale POMDPs with some hundreds of thousands of states from small-scale POMDPs with less than one hundred states mainly because of their emergence and development. The understanding of the notion of the δ-covering number for the reachable belief space (or just "covering number", briefly) has been working as an important driver in the development of point-based value iteration algorithms. The reachable belief space refers to a set of belief states that are reachable from the initial belief state under arbitrary sequences of actions and observations. The covering number is the minimum number of balls with a given radius8>0so that all reachable belief states are within some balls in the set. Existing work has shown that an approximately optimal POMDP solution can be computed in a planning time polynomial in the covering number. In this thesis, we present three algorithms for estimating the covering number and discuss their advantages and disadvantages. We show empirically that the covering number is a far better complexity measure than the state-space size for both planning and learning POMDPs on several small-scale benchmark problems. Furthermore, we extend the covering number from POMDP planning to POMDP learning. We suggest fundamental reasons that can make learning much more difficult than planning from a covering-number perspective, and propose a provably convergent learning algorithm for POMDPs without reset given knowledge of the covering number. We hope the insight of the covering number and its approximation approaches may open up opportunities for new efficient POMDP learning algorithms in the future. Based on the study on covering numbers, we discovered that many point-based value iteration algorithms neglect some important heuristic information to guarantee approximately optimal solutions in a finite time, which results in these algorithms being not efficient enough. We propose an algorithmic framework based on greedy strategies. Its main idea is to exploit these underused heuristics to construct a greedy sub-algorithm, and then insert it into original value iteration algorithms. We construct a greedy sub-algorithm, called second-best policy guidance (SBPG), based on some mathematical background to check the efficiency of the framework. Our experimental results show that some algorithms combined with SBPG have achieved at least one order magnitude improvement than the original algorithms on some POMDP benchmark problems.Contrary to offline planning algorithms, online planning algorithms make decisions "on demand" instead of proactively for the entire state space, and therefore can efficiently handle large-scale POMDPs in relatively short planning time. In this thesis, we exploit the structure of state representation and hybrid heuristics in POMDPs to accelerate current online heuristic planning algorithms. We propose two new online planning algorithms. Each of them is used to check the importance of a recently proposed factored state representation method and a novel hybrid heuristic search method in speeding up online POMDP planning algorithms. Our empirical results reveal that our new online methods substantially outperform state-of-the-art online heuristic search approaches in terms of both scalability and quality.
Keywords/Search Tags:Partially observable Markov decision processes, complexity measure, covering number, planning and learning in uncertainty, offline planningalgorithms, online planning algorithms
PDF Full Text Request
Related items