Font Size: a A A

Research On Adaptive Dynamic Programming Theory For Optimal Control Of Affine Nonlinear Systems

Posted on:2019-09-19Degree:DoctorType:Dissertation
Country:ChinaCandidate:G Y XiaoFull Text:PDF
GTID:1480306344959519Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
In the past twenty years,more and more scholars have been attracted into the research for optimal control with the rapid development of science and technolo-gy and the demand of social production.Since there are inevitable limitations in traditional nonlinear optimal control methods,they are difficult to be implemented in real applications.Recently,as a novel technique based on reinforcement learn-ing,adaptive dynamic programming(ADP)has been proven to be a feasible way to solve the nonlinear optimal control problem.In the framework of ADP,a critic-actor structure is constructed according to the idea of reinforcement learning.It utilizes function approximation structure to approximate the cost function and the control policy of the dynamic programming equation such that they satisfies the principle of optimization,and then,the optimal cost function and the optimal control poli-cy can be obtained approximately by iterated computing.Thus,ADP can obtain the optimal control without the 'curse of dimensionality' of dynamic programming.However,the theory of ADP is far from perfect,many theoretical and technical issues are remaining to be addressed.In this dissertation,several optimal control problems are investigated based on ADP.The main contents and contributions of the dissertation can be briefly described as follows:(1)An online optimal control scheme for a class of unknown discrete-time nonlin-ear systems is developed.The proposed algorithm using current and recorded data to obtain the optimal controller without the knowledge of system dynam-ics.In order to carry out the algorithm,a neural network(NN)is constructed to identify the unknown system.Then,based on the estimated system model,a novel time-based ADP algorithm without using system dynamics is imple-mented on an actor-critic structure.Two NNs are used in the structure to generate the optimal cost and the optimal control policy,respectively,and both of them are updated once at the sampling instant and thus the algorithm can be regarded as time-based.The persistence of excitation condition,which is generally required in adaptive control,is ensured by a new criterion while using current and recorded data in the update of the critic neural network.Lyapunov techniques are used to show that system states,cost function and control signals are all uniformly ultimately bounded with small bounded errors while explicitly considering the approximation errors caused by the three NNs.(2)The optimal tracking control problem(OTCP)for affine nonlinear continuous-time systems with completely unknown dynamics is addressed based on data by introducing the reinforcement learning technique.Unlike existing methods to the OTCP,the proposed data-driven policy iteration method does not need to have or identify any knowledge of the system dynamics,including both drift dynamics and input dynamics.In order to carry out the proposed method,the original OTCP is pre-processed to construct an augmented system composed of the error system dynamics and the desired trajectory dynamics.Then,based on the augmented system,a data-driven policy iteration,which intro-duces discount factor to solve the OTCP,is implemented on an actor-critic NN structure by only using system data rather than the exact knowledge of system dynamics.Two NNs are used in the structure to generate the optimal cost and the optimal control policy,respectively,and the weights are updated by a least-square approach which minimizes the residual errors.The proposed method is an off-policy RL method,where the data can be arbitrarily sampled on the state and input domain.(3)The convergence property and error bounds analysis of a continuous-time val-ue iteration(VI)method for solving the optimal control problem of nonlinear systems is studied.As an important part of ADP technique,a major fea-ture of VI is that it does not require an initial admissible control compared with policy iteration(PI).The idea is to show the feasibility of introducing the VI learning mechanism to solve the continuous-time nonlinear optimal control problem from a theoretical point of view,and discuss the influence of approximation error on the convergence property.First,a contraction as-sumption is established for the continuous-time VI method to describe how close the total value function is to the cost of a single integral step.Based on the contraction assumption,the convergence property of the VI algorithm for solving continuous-time nonlinear optimal control problem is first proved,and can be ensured by initialized with an arbitrary positive semi-definite function.Meanwhile,the approximation errors in each iteration are taken into consid-eration while using approximators to implement the VI method.The error bounds condition which ensures the approximated iterative results converge to a neighbourhood of the optimum is proposed,and the relation between optimal solution and approximated iterative results axe also derived.To vali-date the theoretical results,two neural networks are implemented to construct the critic-actor framework and different desired precision are set to show the comparison.(4)A novel iterative ADP scheme is proposed by introducing the learning mech-anism of VI to solve the constrained optimal control problem for continuous-time affine nonlinear systems with utilizing only one NN.The idea is to show the feasibility of introducing the VI learning mechanism to solve for the con-strained optimal control problem from a theoretical point of view,and thus the initial admissible control can be avoided compared with most existing works based on PI.Meanwhile,the initial condition of the proposed ? based method can be more general than the traditional ? method which requires the initial value function to be a zero function.A non-quadratic function is constructed as the performance function to handle the constrained input,and then a gen-eral analytical method is proposed to demonstrate the convergence property.To simplify the architecture,only one critic NN is adopted to approximate the iterative value function while implementing the proposed method.(5)A novel integral reinforcement learning approach is developed based on VI for designing the H controller of continuous-time nonlinear systems.First,the VI learning mechanism is introduced to solve the zero-sum game prob-lems.which is equivalent to the Hamilton-Jacobi-Isaacs equation arising in H? control problems.Since the proposed method is based on VI learning mechanism,it does not require the admissible control for the implementation,and thus satisfies a more general initial condition than the works based on PI.The iterative property of the value function is analyzed with an arbitrary initial positive function,and the H? controller can be derived as the iteration converges.For the implementation of the proposed method,three NNs are introduced to approximate the iterative value function,the iterative control policy and the iterative disturbance policy,respectively.
Keywords/Search Tags:Adaptive dynamic programming(ADP), nonlinear optimal control, neural network, reinforcement learning, tracking control, H_? control
PDF Full Text Request
Related items