Font Size: a A A

Optimality Equation Of Continuous-time Markov Decision Process Based On Discount Criterion

Posted on:2010-06-11Degree:MasterType:Thesis
Country:ChinaCandidate:J LiFull Text:PDF
GTID:2189360278960189Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
Continuous-time Markov decision model is widely used in actual work. How to choose an optimality policy of Markov decision depends on the adoption of decision criterion most. Average reward criterion and discount reward criterion are two widely used criterions. As now most literatures are about continuous-time Markov decision process based on average reward criterion, the research on continuous-time Markov decision process based on discount reward criterion is defective. The discussion on this problem in the paper fill up the blanks about the given of optimality conditions, establishing of optimality equation of continuous-time Markov decision process based on discount reward criterion and the character of optimality policy. On the other hand, it can provide people gist of decision when they solve a series problems coming down to discount reward during the economic decision process.The paper deals with theα-discount reward criterion for continuous-time Markov decision processes in general state and action spaces when transition rates and the reward rates are allowed to be unbounded. In order to do research on this problem, these works should be done mainly:①As the precondition for exist of optimality equation, the optimality conditions are given first. It concludes three assumes on the system's primitive data and two lemmas got from assumes.②According to the proofs to above optimality conditions, the existence of theα-discount reward optimality equation can be proved, moreover a corresponding discount optimality stationary policy can be found during the process of proof. The policy iteration algorithm provided in the paper is based on the three assumption about the system's primitive data,so the assumption on relative difference of reward function is cancelled in order to keep the authenticity of system's primitive data.③In order to make the choice of policy to avoid the influence of randomicity and weaken the instability, under the given optimality conditions, the existence ofε-average optimality stationary policy can also be ensured. It presents some properties of average optimality stationary policies, which is benefit to simplify the decision process.④At last, an actual economic case about electronic business affairs is adopted to explain how theα-discount reward optimality equation be used to solve such problems in detail. Then in order to illustrate the application in some other aspects, it makes simple illustrations on principle of model establishing and essential of problem, which explains that the optimality equation based on discount reward criterion really works on these problems effectively.
Keywords/Search Tags:Continuous-time Markov Decision Processes, Optimality Equation, Optimality Policy, Character of Optimality Policy, Application Analysis
PDF Full Text Request
Related items