Font Size: a A A

Discrete Time Markov Decision Processes Based On Variance Constraint

Posted on:2022-04-02Degree:MasterType:Thesis
Country:ChinaCandidate:H T LinFull Text:PDF
GTID:2480306734465684Subject:Science
Abstract/Summary:PDF Full Text Request
In this paper,we study the discrete time discount Markov decision processes with the state space being countable space,the action space being Borel space,and the reward function being non-negative variance constrained.The goal is to find a policy that maximizes the expected discounted total reward in a countable state space when the variance of the discounted total reward is constrained.The difficulty of the problem is to prove the existence of the optimal policy when the variance is constrained.In this paper,when solving the problem of the existence of the optimal policy,we first derived the variance formula of the discrete time discounted Markov decision processes,and obtained the variance expression of the discrete time discounted Markov decision processes as follows:in other words,the variance can be regarded as the expected discounted total cost function with discounted factor ?2 and cost function h(x,g).Then the constant constraint on the new variance expression is equivalent to the constant constraint on the new expected discounted total cost.Thus,the existence of the optimal policy of expected discounted total reward with variance constrained is transformed into the existence of optimal policy of expected discounted total reward with total cost constrained.In the constrained optimization problem of Markov decision processes,by using Lagrange multiplier method,it is proved that there is a randomized simple policy to maximize the expected discounted total reward,so as to obtain the existence of optimal policy for discrete time discounted Markov decision processes with variance constrained.Finally,an example of variance constrained is given to illustrate the conclusion.
Keywords/Search Tags:Lagrange multiplier method, Randomized simple policy, Expected discounted total reward, Optimal policy
PDF Full Text Request
Related items