Font Size: a A A

New methods for dynamic programming over an infinite time horizon

Posted on:2003-07-14Degree:Ph.DType:Dissertation
University:Stanford UniversityCandidate:O'Sullivan, Michael JustinFull Text:PDF
GTID:1460390011484937Subject:Operations Research
Abstract/Summary:
Two unresolved issues regarding dynamic programming over an infinite time horizon are addressed within this dissertation. Previous research uses policy improvement to find a strong-present-value optimal policy in such systems, but the time complexity of policy improvement is not known. Here, a method is presented for substochastic systems that breaks the problem of finding a strong-present-value optimal policy into a number of smaller dynamic programming subproblems. Each of the subproblems may be solved using linear programming, giving the entire process a polynomial running time with regard to the size of the original dynamic programming problem. Also, for a specialization of a substochastic system, other solution methods may be applied and the time complexity becomes strongly polynomial.; For normalized systems, policy improvement is still the method of choice. However, policy improvement requires the solution of linear systems that may not always have full rank. In a finite precision environment, the stability of solution methods for these linear systems is critical. One may simplify the computations associated with policy improvement by classifying the states and considering each class separately. Here, a method is presented that applies to any policy with substochastic classes. The method uses the state classification to break the linear system into many smaller linear systems that are either of full rank or rank-deficient by one. Each of the smaller linear systems is then solved in a numerically stable way.; During the development of the previous numerically stable method, the need for a sparse rank-revealing LU factorization became apparent. No such factorization exists in current literature. Here, a new sparse LU factorization is presented that uses a threshold form of complete pivoting. It is found to be rank-revealing for all but the most pathological matrices.
Keywords/Search Tags:Dynamic programming, Time, Policy, Method, Uses, Linear systems
Related items