Font Size: a A A

Acceleration of Iterative Methods for Markov Decision Processes

Posted on:2011-06-24Degree:Ph.DType:Thesis
University:University of Toronto (Canada)Candidate:Shlakhter, OleksandrFull Text:PDF
GTID:2440390002460749Subject:Engineering
Abstract/Summary:
This research focuses on Markov Decision Processes (MDP). MDP is one of the most important and challenging areas of Operations Research. Every day people make many decisions: today's decisions impact tomorrow's and tomorrow's will impact the ones made the day after. Problems in Engineering, Science, and Business often pose similar challenges: a large number of options and uncertainty about the future. MDP is one of the most powerful tools for solving such problems.;My thesis proposes a new value iteration and modified policy iteration methods for classes of the expected discounted MDPs and average cost MDPs.;We establish a class of operators that can be integrated into value iteration and modified policy iteration algorithms for Markov Decision Processes, so as to speed up the convergence of the iterative search. Application of these operators requires a little additional computation per iteration but reduces the number of iterations significantly. The development of the acceleration operators relies on two key properties of Markov operator, namely contraction mapping and monotonicity in a restricted region. Since Markov operators of the classical value iteration and modified policy iteration methods for average cost MDPs do not possess the contraction mapping property, for these models we restrict our study to average cost problems that can be formulated as the stochastic shortest path problem.;The performance improvement is significant, while the implementation of the operator into the value iteration is trivial. Numerical studies show that the accelerated methods can be hundreds of times more efficient for solving MDP problems than the other known approaches. The computational savings can be significant especially when the discount factor approaches 1 and the transition probability matrix becomes dense, in which case the standard iterative algorithms suffer from slow convergence.;There are several standard methods for finding optimal or approximately optimal policies for MDP. Approaches widely employed to solve MDP problems include value iteration and policy iteration. Although simple to implement, these approaches are, nevertheless, limited in the size of problems that can be solved, due to excessive computation required to find close-to-optimal solutions.
Keywords/Search Tags:Markov decision, MDP, Methods, Value iteration, Modified policy iteration, Iterative
Related items