Font Size: a A A

Consensus Method Of Multi-agent Systems Based On Adaptive Dynamic Programming

Posted on:2020-04-01Degree:DoctorType:Dissertation
Country:ChinaCandidate:W WangFull Text:PDF
GTID:1360330599956526Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
The consensus control problem of multi-agent systems(MASs)is one of the most important problems in the area of MASs,where all agents achieve sycronization via local interactions.Based on the number of leaders,consensus control problems are generally classied into three categories: consensus problem(leaderless case),leader-following consensus problem(single leader case),and containment control problem(multiple leaders case).Traditional consensus control methods only consider the stability of the system,do not require the optimality of the system,and need to know the system model information in proior.In practical applications,most of the system model information is unknown or too complex to be obtained,which limits the applications of the traditional consensus control methods.Adaptive dynamic programming(ADP)is an intelligent control method with self-learning ability and optimization ability.It can effectively solve the optimal control problem with unknown system dynamics,and has great potential in solving the model-free optimal consensus control problem of MASs.In this dissertation,the optimal containment control,the leader-following optimal consensus control and the optimal output consensus control of heterogeneous MASs based on ADP under unknown system model information are stuided.In addition,the critic network design,which is the key factor affecting the performance of ADP,is also studied in this dissertation to promote the application of ADP method in the consensus control of MASs with unknown system model information.The main research results and innovations of this dissertation are as follows:(1)The model-free optimal containment control method of linear MASsIn the existing researches concerning containment control of MASs,the dynamics of the MASs is required to be completely known.In this dissertation,a new distributed self-learning control scheme based on action dependent heuristic dynamic programming is developed to achieve the optimal containment control,where the model information of MASs is no longer needed.The containment control problem is first transformed into a regulation problem on the dynamics of the designed local neighborhood containment error.Then,a local Q function is defined for each follower,which is in terms of the local neighborhood containment error and the control input of the follower,and the control inputs of its neighbors.A value iteration method based on the defined local Q function is developed to deal with such a regulation problem.The convergence analysis of the value iteration method is also given.Polynomial regression-based actor-critic framework is adopted to approximate the optimal local Q functions and the optimal control policies for facilitating the implementation of the developed method.It shows that the approximated control policies achieve the containment control and satisfy the global Nash equilibrium.The developed containment control method not only keeps the system stable,but also guarantees the optimality of the system.(2)The leader-following optimal consensus control method of nonlinear MASsGenerally,the optimal consensus control problem relies on solving the coupled nonlinear Hamilton–Jacobi–Bellman(HJB)equations.Traditional ADP-based methods for approximating the coupled nonlinear HJB equations require the system model information.To circumvent the difficulty,a distributed policy iteration ADP method is developed to approximate the solutions of these HJB equations in combination with a defined local Q function.With the approximated solutions,the optimal consensus control of the unknown nonlinear MASs is realized.Besides,the convergence of the developed policy iteration ADP method is proved theoretically.An actor–critic neural network framework for implementing the developed model-free optimal consensus control method is constructed to approximate the local Q functions and the control policies.The developed method does not need to know the system model information,nor does it need to adopt any modeling methods,which improves the engineering applicability of the consensus control method.(3)The model-free optimal output consensus control method of partially observable linear heterogeneous MASsThe output consensus control policy relies on the full state measurement which is hard to fulfill in partially observable environment.Moreover,to achieve the output consensus control,one needs to know the accurate system model.To overcome these deficiencies,a Q-function-based ADP using measurable input/output data without any system knowledge is developed.First,with an adaptive distributed observer designing to estimate the output of the leader,the optimal output consensus control problem is transformed into a distributed optimal tracking control problem.To solve such an optimal tracking control problem,an augmented system consisting of the systems of the follower and the leader is constructed.Then,the representation augmented state vector is built using the measurable historical input/output data to replace the unmeasurable inner system state,which overcomes the partially observable environment.The rationality of the representation state vector method is proved theoretically.To solve the distributed optimal tracking control problem with unknown system model information,a Q function using the state representation vector is defined,and a value iteration ADP algorithm is developed to approximate the optimal tracking control policy and the Q function.The convergence of the developed value iteration ADP algorithm is analyzed.The developed method uses only measurable historical input/output data to solve the optimal output consensus control problem of partially observable linear heterogeneous MASs without accessing the system model information.(4)Gaussian process regression based adaptive critic design method using twophase value iterationADP is an efficient method to realize the optimal control in unknown environment,in which the critic network plays an important role to estimates the value function.Because of the good generalization and easy configuration,Gaussion process regression method is prevalently introduced to construct the critic network.Conventionally,the hyper-parameters of Gaussion process regression need to be predetermined,but empirical selection of them may mislead Gaussion process regression with an improper modeling hypothesis space.To tackle this problem,a two-phase iteration of value function approximation and hyper-parameter optimization for Gaussian process regression based adaptive critic design algorithm is presented in this dissertation,which not only approximates the value function,but also optimizes the hyper-parameters on line.The theoretical proof of the presented algorithm is analyzed based on the stochastic approximation method,and the sufficient conditions guaranteeing the convergence of the algorithm are derived.These conditions point out that the performance of the developed algorithm mostly relies on the design of the coordinated learning rates with respect to the two phases.Finally,a series of numerical experiments are given to discuss the necessity of two-phase update and demonstrate the feasibility of the developed method.At the same time,by applying the developed method to the optimal consensus control problem of MASs,the effectiveness of the developed method is verified.The developed method eliminates the influence of selecting hyper-parameters based on prior knowledge when designing the critic network,and promotes the application of ADP method in the case of unknown system model.
Keywords/Search Tags:Multi-agent systems, adaptive dynamic programming, adaptive critic design, consensus, Gaussian process regression, model-free
PDF Full Text Request
Related items