Optimal Control For Discrete-Time Markov Processes: New Optimality Conditions And Approaches

Posted on:2006-12-28

Degree:Doctor

Type:Dissertation

Country:China

Candidate:Q X Zhu

Full Text:PDF

GTID:1100360182960542

Subject:Probability and Statistics

Abstract/Summary:

PDF Full Text Request

This PH. D. thesis is mainly concerned with some important problems in discrete-timeMarkov decision processes(DTMDP). These problems include: (1) Limsup and liminf aver-age optimality in a denumerable space. (2) Average optimality for DTMDP in Borel spaces:the existence of an average optimal stationary policy and its value iteration algorithm, andits characterization. (3) Average sample-path optimality for DTMDP in Borel spaces. (4)Variance optimality for DTMDP in Borel spaces. (5) Strong n(n = ?1,0)-discount opti-mality for DTMDP in Borel spaces. The results of this thesis improve, extend, unify andcomplement a number of existing results, and they are also applied to a number of casesnot covered by known ones in the previous literature. Some interesting examples such asan inventory system and controlled queueing models are used to illustrated our conditionsand results. This PH. D. thesis is composed of seven chapters.In Chapter I, we introduce the historical background, the subject and the recent devel-opment of DTMDP. The main results of this thesis are also brie?y introduced.Chapter II deals with limsup and liminf average optimality in a denumerable space.We give a new set of conditions and under which the existence of an optimal stationarypolicy for the two average criteria is ensured. The results in this chapter are applied to anadmission control queueing model.In Chapter III, we study DTMDP with Borel state and action spaces. The criterion tobe minimized is average expected costs. We first provide "two optimality inequalities withopposed directions", and give conditions for the existence of solutions to the two inequalities.Then, from the two inequalities we ensure the existence of average optimal (deterministic)stationary policies under additional continuity-compactness assumptions. Our conditionsare weaker than those in the previous literature. In particular, some new su?cient conditionsfor the existence of average optimal stationary policies are imposed on the primitive dataof the model. Moreover, our approach is slightly di?erent from the well-known "optimalityinequality approach"widely used in DTMDP. On the other hand, we also further study someproperties of optimal policies. We not only obtain two necessary and su?cient conditionsfor optimal policies, but also give a "semimartingale characterization"of an average optimalstationary policy. Finally, we apply our results to a controlled queueing system as well asthe generalized Potlach process with control.Chapter IV is concerned with the value iteration algorithm for average cost Markovdecision processes in Borel state and action spaces. The costs may possibly have neitherupper nor lower bounds instead of only nonnegative (or bounded below) widely used inthe previous literature. Moreover, our conditions are weaker than those in the previousliterature.Chapter V deals with DTMDP with average sample-path costs (ASPC) in Borel spaces.We propose new conditions for the existence of Îµ-ASPC-optimal (deterministic) stationarypolicies in all randomized history dependent policies. Our conditions are weaker than thosein the previous literature. In particular, the stochastic monotonicity condition in this chapterhas been first used to study the ASPC criterion. Moreover, the approach provided hereis slightly di?erent from the "optimality equation approach"widely used in the previousliterature. On the other hand, under a mild assumption we prove that ASPC-optimalityand average expected cost optimality are equivalent. Finally, we use a controlled queueingsystem to illustrate our results.In Chapter VI, we study DTMDP with variance-minimization in Borel spaces. Thecosts may possibly have neither upper nor lower bounds. We propose another set of condi-tions on the system's primitive data, and under which we prove the existence of variance-minimization in the class of average expected cost (AEC) optimal stationary policies. Itshould be noted that the approach provided here is slightly di?erent from the "optimalityequation approach"widely used in the previous literature. Finally, we use a controlledqueueing system to illustrate our results.In Chapter VII, we study discrete-time Markov decision processes with average expectedcosts (AEC) and discount-sensitive criteria in Borel state and action spaces. We proposeanother set of conditions on the system's primitive data, and under which we prove that (1)AEC optimality and strong ?1-discount optimality are equivalent; (2) a condition equivalentto strong 0-discount optimal stationary policies; (3) the existence of strong n(n = ?1,0)-discount optimal stationary policies. Our conditions are weaker than those in the previousliterature. Moreover, we provide a new approach to prove the existence of strong 0-discountoptimal stationary policies. It should be noted that our way is slightly di?erent from thosein the previous literature because the latter is via bias optimality, which we do not use atall. Finally, we apply our results to an inventory system and a controlled queueing system.

Keywords/Search Tags:

discrete-time Markov decision process, optimal stationary policy, value iteration algorithm, average criterion, variance criterion

PDF Full Text Request

Related items

1	Variance Optimization For Continuous-time Markov Decision Processes
2	Discrete Time Markov Decision Processes Based On Variance Constraint
3	Optimal Control Of Discrete-Time Systems:Average-Reward-Based Reinforcement Learning Methods
4	Research On Sequence Planning Based On POMDPs
5	Research On Optimal Reinsurance-investment Strategy Based On α-maxmin Mean-variance Criterion
6	Research On Optimal Reinsurance-investment Policies For An Insurer With Mispricing Under Mean-variance Criterion
7	Continuous-time Markov Decision Processes In Random Environments
8	Average optimality in infinite horizon optimization
9	Portfolio Optimization Under Criterion Of Maximizing:Geometric Average Expected Rate Of Return
10	Optimal Time-consistent Investment And Reinsurance Strategies For Mean-variance Insurer