Font Size: a A A

Optimal Control Of Discrete-Time Systems:Average-Reward-Based Reinforcement Learning Methods

Posted on:2022-05-06Degree:MasterType:Thesis
Country:ChinaCandidate:Y F HuFull Text:PDF
GTID:2480306557995319Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
With the development of the control theory,the social production puts forward higher requests for the system performance and the control cost.The optimal control problem receives more and more attention.Traditional methods for the optimal control problem can only deal with special systems and it is difficult for them to address general systems.Reinforcement Learning is a model-free approach for finding the optimal policy and it's well-suited to handle these general cases.In this thesis,we study the optimal control problems for discrete-time systems with Reinforcement Learning methods with markov decision process.The main contents are as follows:Chapters 1-2 summarize and analyze the research status in the fields of Reinforcement Learning and optimal control.Preliminaries about the considered problems are also introduced.Chapter 3 studies the markov decision process with uncountable state space.First,we give the definition of markov decision process with uncountable state space.Then,we provide and prove the optimality equations with the expected total reward criterion and the average reward criterion,respectively.The structure of the optimality equations in uncountable and countable state space are consistent with either criterion.Finally,with the expected total reward criterion,suppose that the optimal policy is deterministic,then it is conserving.This explains that why we can use the optimal action state value function to determine the optimal policy.In chapter 4,based on the results in Chapter 3,we study a class of stochastic discrete-time systems in which the state noise exists and the interval between two time instants changes with iterations.The methodology for optimal control with Reinforcement Learning algorithms is developed in markov decision process.Based on the system equation,a markov decision process with uncountable state space is built at first and thus the original problem is transformed into the problem of finding the optimal policy with the markov decision process.Then,the average reward criterion and the optimality equations are utilized to prove that the optimal policy of the optimal control problem is bias-optimal with the average reward criterion,and thus an average reward algorithm is designed.Finally,the feasibility of the algorithm is verified by a numerical simulation.Chapter 5 solves the optimal control problem for a class of discrete-time multiagent system with the average-reward-based learning algorithms.The controller structure is given and both on-policy and off-policy average-reward-based Reinforcement Learning algorithms are designed to achieve the state consensus in the shortest time.The numerical simulations show both algorithms can learn the optimal communication topology and the optimal controller parameter.We make a summary in the sixth chapter and propose the research work in the future.
Keywords/Search Tags:discrete-time system, optimal control, markov decision process, average reward, Reinforcement Learning, multiagent system
PDF Full Text Request
Related items