Acceleration of Iterative Methods for Markov Decision Processes

Posted on:2011-06-24

Degree:Ph.D

Type:Thesis

University:University of Toronto (Canada)

Candidate:Shlakhter, Oleksandr

Full Text:PDF

GTID:2440390002460749

Subject:Engineering

Abstract/Summary:

This research focuses on Markov Decision Processes (MDP). MDP is one of the most important and challenging areas of Operations Research. Every day people make many decisions: today's decisions impact tomorrow's and tomorrow's will impact the ones made the day after. Problems in Engineering, Science, and Business often pose similar challenges: a large number of options and uncertainty about the future. MDP is one of the most powerful tools for solving such problems.;My thesis proposes a new value iteration and modified policy iteration methods for classes of the expected discounted MDPs and average cost MDPs.;We establish a class of operators that can be integrated into value iteration and modified policy iteration algorithms for Markov Decision Processes, so as to speed up the convergence of the iterative search. Application of these operators requires a little additional computation per iteration but reduces the number of iterations significantly. The development of the acceleration operators relies on two key properties of Markov operator, namely contraction mapping and monotonicity in a restricted region. Since Markov operators of the classical value iteration and modified policy iteration methods for average cost MDPs do not possess the contraction mapping property, for these models we restrict our study to average cost problems that can be formulated as the stochastic shortest path problem.;The performance improvement is significant, while the implementation of the operator into the value iteration is trivial. Numerical studies show that the accelerated methods can be hundreds of times more efficient for solving MDP problems than the other known approaches. The computational savings can be significant especially when the discount factor approaches 1 and the transition probability matrix becomes dense, in which case the standard iterative algorithms suffer from slow convergence.;There are several standard methods for finding optimal or approximately optimal policies for MDP. Approaches widely employed to solve MDP problems include value iteration and policy iteration. Although simple to implement, these approaches are, nevertheless, limited in the size of problems that can be solved, due to excessive computation required to find close-to-optimal solutions.

Keywords/Search Tags:

Markov decision, MDP, Methods, Value iteration, Modified policy iteration, Iterative

Related items

1	Research On Sequence Planning Based On POMDPs
2	Variance Optimization For Continuous-time Markov Decision Processes
3	Several Iterative Methods For Fixed Points Of G-nonexpansive Mappings
4	Iterative Algorithms To Fixed Points Of Self And Nonself-Pseudo-contractions
5	Modified Chebyshev-Picard iteration methods for solution of initial value and boundary value problems
6	Convergence Theorems Of Three-step Iteration Sequence With Errors And Modified Three-step Iteration Sequence With Errors For A Pair Of Mappings
7	Convergence Analysis On Higher-order 2PPJ Iterative Methods
8	Sufficient And Necessary Conditions Of The Convergence Of AOR And 2PPJ Iterative Methods For A Special Class Of Matrices
9	Optimal Control For Discrete-Time Markov Processes: New Optimality Conditions And Approaches
10	Several Modified Iterations Of Solving Nonlinear Equations