Approximate Policy Iteration Algorithms for Continuous, Multidimensional Applications and Convergence Analysis

Posted on:2012-06-24

Degree:Ph.D

Type:Dissertation

University:Princeton University

Candidate:Ma, Jun

Full Text:PDF

GTID:1450390008493827

Subject:Business Administration

Abstract/Summary:

The purpose of this dissertation is to present parametric and non-parametric policy iteration algorithms that handle Markov decision process problems with high-dimensional, continuous state and action spaces and to conduct convergence analysis of these algorithms under a variety of technical conditions. An online, on-policy least-squares policy iteration (LSPI) algorithm is proposed, which can be applied to infinite horizon problems with where states and controls are vector-valued and continuous. No special problem structure such as linear, additive noise is assumed, and the expectation is assumably uncomputable. The concept of the post-decision state variable is used to eliminate the expectation inside the optimization problem, and a formal convergence analysis of the algorithm is provided under the assumption that value functions are spanned by finitely many known basis functions. Furthermore, the convergence result extends to the more general case of unknown value function form using orthogonal polynomial approximation. Using kernel smoothing techniques, this dissertation presents three different online, on-policy approximate policy iteration algorithms which can be applied to infinite horizon problems with continuous and high-dimensional state and action spaces. They are kernel-based least squares approximate policy iteration, approximate policy iteration with kernel smoothing and policy iteration with finite horizon approximation and kernel estimators. The use of Monte Carlo sampling to estimate the value function around the post-decision state reduces the problem to a sequence of deterministic, nonlinear programming problems that allow the algorithms to handle continuous, vector-valued states and actions. Again, a formal convergence analysis of the algorithms under a variety of technical assumptions is presented. The algorithms are applied to different numerical applications including linear quadratic regulation, wind energy allocation and battery storage problems to demonstrate their effectiveness and convergence properties.

Keywords/Search Tags:

Policy iteration, Convergence, Continuous, Problem

Related items

1	The Study Of Fast Algorithm For Large Continuous Sylvester Equation AX+XB=F
2	Variance Optimization For Continuous-time Markov Decision Processes
3	The Convergence Theorems Of Iteration Sequence Of ?-Nonexpasive Mappings
4	Energy Storage Applications of the Knowledge Gradient for Calibrating Continuous Parameters, Approximate Policy Iteration using Bellman Error Minimization with Instrumental Variables, and Covariance Matrix Estimation using an Errors-in-Variables Factor Mo
5	Study On HSS-bascd Iteration Methods And Accelerated Techniques For Solving Some Linear And Nonlinear Systems And A Class Of Continuous Sylvester Equations
6	Partial Order O_M- Convergence And Convergence Liminf- Set
7	Iteration Algorithms For Solving Split Equality Feasibility Problem And Its Extended Problems
8	The Analysis Of Iteration For The Nonlinear Operators Under The Condition Of H(?)lder Continuous Or Nondifferential
9	The Convergence Of N-Policy GI/G/1Queuing System With Set-up Period
10	Cross National Convergence Of Biotechnology Policy