Asynchronous Optimization Algorithms For SMDP Based On Performance Potential

Posted on:2007-01-03

Degree:Master

Type:Thesis

Country:China

Candidate:Y H Wu

Full Text:PDF

GTID:2178360182986611

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the developing of the society and technology, the potential analysis and optimization of discrete event dynamic system (DEDS) has become an advanced study aspect in cross field of control and system, management and computer. The semi-Markov decision process (SMDP) can analysis most of system in society. Motivated by the needs of the application, the optimization of SMDPs has been one of research focuses in the control field. Performance potentials theory provides a unified framework for SMDPs, optimization.This paper is concerned with the asynchronous optimization problems of semi-Markov decision processes (SMDPs) with compact action set based on the performance potential, and all the algorithms for both discounted and average performance criteria. First, the unified standard value iteration (VI) algorithm based directly on the equivalent infinitesimal generator A_a~v is considered, and the convergence is established. Second, the unified asynchronous VI algorithms including Gauss-Seidel iteration algorithm and asynchronous VI algorithm based on the simulation of a sample path. Then, according to the performance potential theory, the corresponding modified VI is discussed. The above results will be applicable to continuous-time Markov decision processes.The traditional theoretical algorithms can compute quickly and the obtained results are precision, but can usually not be used to optimize large-scale system and the system with not many information. The simulation optimization algorithms such as temporal differences (TD) learning and neuro-dynamic programming (NDP) optimization algorithms and so on can solve the above problem. Based on these characteristic, the paper introduced the unified asynchronous policy iteration (PI), such as multistage lookahead policy iteration, multistage lookahead PI based TD learning and NDP. These algorithms are unified for both discounted and average performance criteria.At last one numerical example is used to show the different properties of the algorithms, the obtained results will be applicable to continuous-time Markov decision processes (MDPs).Based on the asynchronous algorithm, the paper introduction theoptimization simulation platform, the platform can input suitable parameter based on the system, and provide convenience for the performance optimization of most systems.

Keywords/Search Tags:

Semi-Markov decision process, Performance potentials, Asynchronous iteration, Optimization simulation platform

PDF Full Text Request

Related items

1	Robust Control For Uncertain Semi-Markov Decision Processes Based On Performance Potentials
2	Unified Algorithms For Semi-Markov Decision Processes With Discounted And Average Criteria Based On Performance Potentials By Reinforcement Learning
3	NDP Optimization For Large-scale Markov Systems Based On Performance Potentials-learning
4	Semi-Markov-Based Security Effectiveness Evaluation And Defense Decision-Making For Dynamic Platform Techniques
5	Parallel Algorithms For Large-Scale Markov Decision Processes Based On Performance Potentials
6	Continuous-Time Unified MAXQ Algorithm And Its Application
7	Performance Sensitivity Analysis And Optimization Of Extended Markov Decision Processes
8	Study On The Learning And Planning Algorithm Of Intelligent Agent Based On Performance Potentials
9	Performance Potential-based NDP Optimization Approaches And Application Research For SMDP
10	Research On Agent Decision Problem Based On Markov Decision Process Theory