Font Size: a A A

Markov Systems: Gradient Approximation Approach And Applications To Communications

Posted on:2010-12-29Degree:DoctorType:Dissertation
Country:ChinaCandidate:B K BaoFull Text:PDF
GTID:1100360275455458Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
With the development of science and technology,there are large numbers of complicated and stochastic systems in many areas,including communication(Internet and wireless),manufacturing,intelligent robotics,and traffic management etc.Performance optimization of these systems is the research focus of different disciplines,which including perturbation analysis(PA) of discrete event dynamic systems in control systems,Markov decision processes(MDPs) in operations research,reinforcement learning(RL) or Neurodynamic programming(NDP) in computer sciences and artificial intelligence.Although different areas take different perspectives and have different formulations for structures of these systems,they share the common goah to make the best decision to optimize the system performance.Recently,a performance optimization method from a sensitivity point view can explain and unify the above methods in different areas.The potential theory is the basis of this method.By using two types of performance sensitivity formulas,performance derivatives and performance differences,existing results in the above different areas and their relations can be derived or explained in a simple and intuitive way.This method can not only obtain the optimal policy by solving the theoretical values,but also improve the system performance on line based on one sample path even if the parameters of systems are unknown.Thus,this method can circumvent "curse of dimensionality" and "curse of modeling".This method have not been studied for adaptive Markov reward processes. In this paper,on the basis of this method,We first study the performance sensitivity analysis for adaptive Markov reward processes,then obtain performance derivatives and performance differences formulas,and last study the estimator of performance derivatives through a single sample path.Simulation-based gradient stochastic approximation is a gradient stochastic approximation method,which improves the parameters through a single sample path.In this method,we first parameterize the stochastic policies,then estimate the performance gradient with respect to the parameter through a single sample path,and finally,improve the policy by adjusting the parameter in the gradient direction.There are two algorithms in this method according to the different update time—"regenerative-update gradient stochastic approximation algorithm" and "every-update gradient stochastic approximation algorithm".The parameter is updated at regenerative state in the first algorithm, while updated at every state in the the other.Although,these two algorithms circumvent "curse of dimensionality" and "curse of modeling",they also encounter some problems. The regenerative-update algorithm updates the parameters at every visit to the regenerative state.When the state space is large,the regenerative state could be visited infrequently,and this results in slow convergence and large sample path variance.The every-update one updates the parameters at every state transition.This leads to high computation cost and is not implementable in practice.To resolve these issues,we provide two time-scale gradient stochastic approximation algorithms for Markov reward processes, adaptive Markov reward processes and stochastic policy Markov decision processes.The main idea is to update the parameter at a fixed special time sequence,which becomes sparser and sparser as time goes.Since the "updating cycles",the periods between two adjacent updating instants,are much shorter than regenerative cycles at beginning,the proposed scheme can resolve the long regenerative cycle problem in regenerative-update scheme,and converges to the expected parameters much faster.In the subsequent cycles, since the updating cycles are longer than the regenerative cycles,the computational cost can be significantly reduced.Our algorithm achieves fast convergence,small sample path variation and low computational cost.Moreover,it is an online algorithm,which allows time-varying system parameters.We also establish that the performance gradient converges to zero with probability 1 under some mild assumptions,which is the strongest possible result for gradient-related stochastic approximation algorithms.There are more and more research on wireless multi-media communication problems. In this paper,we focus on call admission control policy for dynamic code assignment in OVSF-CDMA systems and one constricted by QoS.We model the problem to Markov decision processes,and provide a online algorithm.This algorithm use two time-scale technology to decrease the computational cost and convergence rate.Moreover,it allows time-varying system parameters,and adapts easily to the complicate circumstance in wireless multi-media communication problems.
Keywords/Search Tags:Markov decision processes, Adaptive Markov decision processes, Two time-scale, Gradient approximation, Performance optimization, Performance sensitivity analysis, CDMA systems
PDF Full Text Request
Related items