Font Size: a A A

Research On Sequence Planning Based On POMDPs

Posted on:2016-12-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:F LiuFull Text:PDF
GTID:1220330461458546Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Intelligent planning is one of the most important research fields in artificial intelligence. Along with the increasing of scale and complexity of problems, how to design efficient intelligent decision algorithms under large-scale uncertain environment is the most important research point in current intelligent planning field. As POMDP model is very suitable for expressing the uncertainty of environment, action and observation, the planning based on POMDP model under uncertain environment plays an important role in the research of automatic planning. However, solving POMDPs exactly is computationally intractable, therefore Point-based POMDP heuristic algorithms has become the current research hotspot for solving large-scale POMDP problems in the past decade. However, most of the current offline POMDP approximate approaches are based on single heuristic criteria, which may not apply to different applications, and most of recent online approximate approaches only use the upper bound of the value function to explore belief points thus decrease the efficiency of convergence. This paper researches on how to design efficient heuristic POMDP algorithms to solve these problems.There are three research themes in this paper. Firstly, we report the heuristic exploration standards of the main point-based POMDP approximate algorithms and design a new heuristic method HHVI which composites the density-based standard and value function based standard to improve the quality of explored belief point set. Secondly, we propose a method to construct δ-covers for the reachable belief point set based on clustering, and design a point-based policy iteration algorithm CBPI to solve off-line POMDP problems. Finally, we optimize the exploration standard for exploring the optimal reachable belief space, and design a probability-based algorithm for exploring the optimal reachable belief space, which iproves the efficient of online POMDP algorithm. Specifically, the main innovation points of this paper are as follows:1. There are three kinds of heuristic exploration standards for the main point-based POMDP approximate algorithms:density-based exploration standard, value function based exploration standard and MDP-based exploration standard. However, MDP-based approximation algorithms do not consider partial observability, so it may degenerate into random strategy sometimes; density-based approximation algorithms can not control the size of the exploring point set; value function based approximation algorithms efficiency can not guarantee convergence efficiency because of high algorithm complexity. This paper propose a hybrid heuristic value iteration algorithm HHVI, which composites the density-based standard and value function based standard to estimate explored blief points and explore subsequent blief points. The experiment shows that HHVI enhances the adaptability to different POMDP problems.2. Recent research shows that the 8-covering number of the reachable belief space can measure the complexity of approximate algorithms. However, it is an NP-hard problem to work out the δ-covering number of the reachable belief space exactly. This paper analyzes the clustering feature of the reachable belief space, clusters the reachable belief space and designs an efficient method to construct δ-covers for the reachable belief space according to the distribution feature of reachable belief points. Then we solve off-line POMDP problems by policy iteration.3. In order to theoretically guarantee the global optimal solution, most of the current online heuristic algorithms explore belief points according to the optimal upper bound of value function. However, the lower bound of value function can guarantee the actual quality of strategy, so online planning algorithms will choose the action with highest lower bound of value function as the final decision. In view of these rules, we propose a new criterion that choosing the action with the highest optimal probability as the optimal one. We design a new on-line POMDP approximate algorithm PBORSI, which calculates the optimal probability of each action based on the probability distribution of action’s value function and chooses the action with the highest optimal probability. The experiment shows that PBORSI can explore the optimal reachable space more accurately thus improve the convergence efficiency in some large-scale POMDP problems.
Keywords/Search Tags:Partially Observable Markov Decision Process, Point-based Value Iteration, Policy Iteration, Online Heuristic Algorithm, Monte Carlo method
PDF Full Text Request
Related items