Font Size: a A A

Adaptive Optimal Control Of Markov Jump Systems Based On Q-Learning

Posted on:2024-01-17Degree:MasterType:Thesis
Country:ChinaCandidate:P X ZhouFull Text:PDF
GTID:2568307127454124Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
In the fields such as transportation,aerospace,finance,and economics,random sudden situations or changes in working conditions can lead to multiple operating modes that can be converted to each other in actual systems.Generally,Markov jump systems(MJSs)can be established to describe these systems.However,with the increasingly complex application scenarios of MJSs and the increasing uncertain factors,it is difficult or even impossible to obtain an accurate system model,which brings new challenges to design the optimal controller.As a type of reinforcement learning,Q-learning provides a feasible solution to these difficulties.Such algorithm can gradually update the optimal controller through online learning even in the absence of system model information,and achieve adaptive capabilities.However,there are still many unresolved problems for MJSs,especially for discrete-time MJSs,which have fewer research results.Both the theoretical framework and implementation methods need further improvement.Therefore,this thesis mainly studies adaptive optimal control of discrete-time MJSs based on Q-learning.The main work are summarized as follows:(1)For the linear quadratic regulation problem of discrete-time MJSs,a value iteration(VI)algorithm based on Q-learning is investigated to obtain model-free adaptive optimal control policies.The algorithm consists of two steps: value update and policy update.In the case of completely unknown system dynamics,the two steps alternate to achieve adaptive optimal control through learning while controlling.Unlike linear systems,MJSs involve multiple operating modes.In order to ensure the feasibility of the learning algorithm,the modes augmentation method is used in the algorithm design to increase the dimension of the estimated parameters.(2)For the H∞ control problem of discrete-time MJSs,an online VI algorithm is investigated to obtain adaptive optimal control policies based on a two-player zero-sum game(ZSG).First,the H∞ control problem is transformed into a two-player ZSG problem.Then,action and mode dependent Q-functions are constructed and an online VI algorithm is designed to obtain the optimal policies,which can minimize the cost function under the worst-case interference.Finally,the convergence of the policies is proved.By this algorithm,stochastic stabilization and disturbance attenuation can be achieved for MJSs without knowledge of the system matrices.Since the VI algorithm updates the policies by obtaining the system states online,it has the ability to adapt to the changes of system parameters to a certain extent.(3)For the optimal tracking control problem of discrete-time MJSs,a VI algorithm based on influence function(IF)is investigated to obtain adaptive optimal tracking policies based on a two-player non-zero-sum game(NZSG).First,the optimal tracking control problem with two control inputs is transformed into a two-player NZSG problem to obtain the optimal policies that can minimize the respective cost functions and achieve the overall Nash equilibrium to cope with multitasking and fault tolerance.Then,auxiliary functions are introduced to establish the relationship between the coupled action and mode dependent Q-functions in two consecutive iterations.It is prove that the Q-functions are monotonically increasing with an upper bound,so the convergence of the algorithm is proved.The VI algorithm based on IF can effectively eliminate outlier data points and update the policies for each mode in parallel.Therefore,the learning ability and applicability of the algorithm are further improved.
Keywords/Search Tags:Markov jump system, Q-learning, Value iteration, Adaptive optimal control
PDF Full Text Request
Related items