| The stochastic dynamic system and Lebesgue sampling system have been widely used in various fields such as communication network,flexible manufacturing,artificial intelligence,military command management,production and life,and are the research hotspots of scholars in the field of learning and optimization.Although,scientific research in each field has its own problem description of the system structure.However,most research methods are based on the optimal performance of the system,that is,looking for an "optimal strategy" to optimize system performance.This paper will use Lebesgue sampling technology to solve the optimal control problem of stochastic systems based on the performance potential theory.Combined with the research of predecessors,this paper mainly does the following three aspects:1.For the optimal control problem of stochastic dynamic systems,the strategy iterative method is used to solve the problem.Firstly,based on the performance potential theory and the optimality equation of the feedback control system,the strategy iterative algorithm for the model problem is given.Then,combined with the MATLAB simulation environment,the performance potential can be estimated from the constructed sample path by using the strategy evaluation in the algorithm without identifying all the parameters of the system.Finally,implement policy improvements to find the optimal strategy to optimize system performance.2.For the optimal control problem of Lebesgue sampling system,the time aggregation method in Markov decision process is used to solve the problem.Firstly,based on the general model of the optimal control problem in the previous work,the mathematical model of the Lebesgue sampling system is given.Then,combined with Lebesgue sampling technique,time aggregation method,strategy iterative algorithm and analytical method to solve the model,the optimal performance of the system and the corresponding optimal strategy can be obtained.Finally,the Lebesgue sampling system is compared with the traditional periodic sampling system.Through MATLAB simulation comparison,the Lebesgue sampling method can not only improve the system performance,but also reduce the system resource consumption.Thus,the "dimensionality disaster" problem of this type of system is solved to some extent.According to the optimization problems of the above two types of systems,combined with the reinforcement learning technology,the optimal control problem of the discrete event dynamic system is solved.Firstly,based on the sample path and Q learning technique,an optimization algorithm for the first-order continuous-time stochastic dynamic system is given.Then,based on the performance potential,an online strategy iterative method,also known as SARSA algorithm,is introduced to solve the optimal control problem of the system.Finally,by numerical examples,compared with periodic sampling,the Lebesgue-based sampling strategy is significantly better than the periodic sampling strategy.Therefore,the Lebesgue sampling method is more suitable for the actual control system. |