Font Size: a A A

Research On Model-Free Differential Game Problems Based On Adaptive Dynamic Programming

Posted on:2024-05-09Degree:MasterType:Thesis
Country:ChinaCandidate:Q X HanFull Text:PDF
GTID:2530306932961009Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
In recent years,adaptive dynamic programming(ADP)theory has attracted increasing attention due to its broad applications.Since control strategies that involve the system dynamics may not complete engineering tasks because environmental and state information may not be accurately measured or the costs are too high in practice,the study of model-free ADP has more practical significance.Moreover,in the game process,the behavioral choice of each individual may not be consistent with the goal of the system.Instead,it forms complex cooperation,competition and asymmetric relations among individuals.Therefore,it is more theoretical and practical significance to investigate the differential games that contain the self-motivated behaviors of individuals.This dissertation thus characterizes the interactive behaviors based on differential game theory,and proposes model-free control strategies based on ADP theory to deal with the zero-sum game,non-zero-sum game and Stackelberg game problems,respectively,in the case that system dynamics are completely unknown.The dissertation is organized as follows.Firstly,this dissertation proposes a model-free integral output feedback policy iteration algorithm to address the zero-sum differential game problem.For a linear continuous-time system without knowing system dynamics and state information,the differential game of the controlled system is transformed into an optimal control problem of an augmented system at first.Then,the existence and uniqueness of the solution to the GARE as well as the upper bound of the discount factor are studied.To eliminate the dependence on system states,they are reconstructed by a finite number of measured outputs by virtue of the state reconstruction technique.The output-feedback Bellman equation is constructed,and the model-free Off-Policy optimal control policy and the worst disturbance policy algorithms are designed based on integral reinforcement learning.The convergence analysis of the designed algorithms is given,and the effectiveness of the proposed approach is verified by a simulation example.Secondly,a model-free Q-learning output feedback policy iteration algorithm is developed to solve the non-zero-sum differential game problem.For a linear discretetime system without knowing system dynamics and state information,a policy iteration algorithm based on state feedback is first given,and the optimal control policy based on Q-learning state feedback is further obtained.Then,the state reconstruction technique is employed to reconstruct the states by using the sampled data of the inputs and outputs to avoid the requirement on the system states.The Bellman equation of the Q function is constructed based on the historical input and output data to design a model-free optimal output feedback algorithm by utilizing the Q-learning method.The convergence and unbiased properties of the algorithm are given,and the effectiveness of the proposed method is illustrated by a simulation example.Thirdly,a model-free online synchronous approximate optimization algorithm is proposed based on the critic-only framework to deal with the Stackelberg differential game problem.For a nonlinear continuous-time system without knowing system dynamics,a model neural network(NN)is established to reconstruct the unknown system by using online measured input and output data to the system dynamics.The convergence of the model neural network weights is also proved..In order to reduce communication and computational resource,critic-only neural networks are respectively designed for players to approximate their value functions.The tuning laws for weight matrices of the two critic NNs and costate information are synchronously adjusted online and in realtime to reduce the computational complexity of the algorithm in the learning process.Moreover,the closed-loop system and the estimation errors of critic NNs weight matrices are shown to be uniformly ultimately bounded by virtue of the Lyapunov approach.The effectiveness of the proposed method is proved with two comparative simulation examples.
Keywords/Search Tags:Adaptive dynamic programming, Reinforcement learning, Optimal control, Differential games
PDF Full Text Request
Related items