Font Size: a A A

Optimization For POMDPs Based On Observation Directly

Posted on:2017-01-18Degree:MasterType:Thesis
Country:ChinaCandidate:Z JiFull Text:PDF
GTID:2180330485453748Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
The optimization of complicated stochastic dynamic systems always is an impor-tant problem in many scientific areas. For some complicated stochastic dynamic sys-tems with Markov character, the systems’states could not be observable, but the factors called "observation", which has some relation with systems’states, could be gotten. Markov decision processes could not solve the kind of particular stochastic dynamic systems with Markov character, so partially observable Markov decision processes is presented as the promotion of Markov decision processes. Partially observable Markov decision processes could model the complicated systems whose states could not be ob-served. So partially observable Markov decision processes could be applied in more areas than Markov decision processes.For the optimization of Markov decision processes, different scientific areas present different optimization methods. Recently, a new perspective called the sensitivity-based approach of stochastic learning and optimization is presented. The approach is combi-nation of different methods in different areas. The approach is based on the potential theory and regards two kinds of sensitive formulas as the centre. Recently, for the partially observable Markov decision processes, the two kinds of sensitivity formulas based on the system’s observation, sensitivity difference formula and sensitivity deriva-tive formula, have been given. But the derivation of the two formulas must be based on the strict assumption:under any two policies, the steady probability are the same. Be-sides, some factors and equalities are still based on the state. The two constrains make the present sensitivity-based of partially observable Markov decision processes method just could be used on some particular queue system. It could not be used on normal partially observable Markov decision processes.This thesis will present a sensitivity-based optimization approach just based on the systems’observation, which is the promotion of the sensitivity-based optimization ap-proach in [26]. The new approach just bases on the system’s observation space. The definition of potential, reward functions and other system’s factors,which are just based on observation, will be presented in the third chapter. Then based on the relation between factors just based on observation and the factors just based on state, the pois-son equation just based on observation will be derived. All of derivation of equations don’t need strict assumption. Because just based on observation, the optimal policy in this approach just be a sub-optimal policy. However, the approach don’t need strict assumption so that is very practical. Secondly, compared with other methods, the ap-proach don’t need compute some other factors based on states, so the computation of this new approach may be lower. The third chapter will focus on the derivation of per-formance difference formula, then a observation-based policy iteration algorithm based on performance difference formula will be presented. At the last of this chapter, the performance derivative formula in new approach will be presented.In this thesis, we discuss the optimization problem of the complex large system in partially observable Markov decision processes model. A policy iteration algorith-m based on the theory of hierarchical control will be presented. Under the theory of hierarchical control, several sub-systems are interact with each other, so the optimiza-tion of complex large system of partially observable Markov decision processes is the a optimization problem with constraints. The reason of the constraints is because of the interaction between the sub-systems. In the fourth chapter, we will model the constraints by introducing the Lagrange factor in objective function. Then a sensitivity analysis on partially observable Markov decision processes with constraints will be presented. Un-der the analysis, policy iteration algorithm based on the theory of hierarchical control will be presented. The algorithm don’t need any strict assumption so that it is practical.Finally, two examples are presented to proof the proformance of the two algorithms.
Keywords/Search Tags:POMDPs, performance difference formula, performance derivative formula, optimal policy, performance optimization, performance sensitivity analysis
PDF Full Text Request
Related items