It is an effective intelligent decision making method,reinforcement learning has been widely used to solve the optimal decision problems of nonlinear stochastic systems with unknown dynamics.However,when the control strategy is not designed rely on the known model,the control strategy is largely a black box,and it is difficult to analyze its behavior in different states.Therefore,the control performance of reinforcement learning is mostly evaluated through trial and error,and it is difficult to provide any theoretical guarantee for the performance of the control strategy.However,given a control system,stability is the most important characteristic,as an unstable control system is usually useless and even potentially dangerous.When the control task is too complex,the task can be broken down into several sub-tasks,but when switching between multiple models,there is no relevant research on the stability guarantee.Considering UAV autonomous navigation and load delivery,a stability theorem for reinforcement learning is proposed in this paper,and a reinforcement learning algorithm is designed to ensure the stability of switching among multiple UAV models.The main achievements include the following aspects:(1)Stability theorem of reinforcement learning based on multi-Lyapunov functions.In this paper,the stability theorem of multi-Lyapunov functions in switching system is introduced into reinforcement learning,and a stability theorem suitable for reinforcement learning algorithm is proposed.And the stability theorem has been proved.(2)The multi-Lyapunov Actor-Critic algorithm.Based on the stability theorem proposed in this paper,a reinforcement learning algorithm with stability guarantee called Multi-Lyapunov Actor-Critic algorithm is designed.I construct Lyapunov function which is used to constrain the learned strategy with a fully connected neural netwoek.(3)UAV multi-model control based on reinforcement learning.Since it is difficult for UAV to be trained by reinforcement learning in three-dimensional space,imitation learning has been used to pre-train the strategy network and the value network of reinforcement learning,and to realize the autonomous navigation of UAV in unknown environment.In the autonomous hovering task of UAV,each retraining takes a long time due to the great change of load weight.Therefore,the meta-reinforcement learning algorithm is used to pre-train the parameters of the control network,the hovering task can be completed with a small amount of training process.Finally,the reinforcement learning algorithm with stability guarantee is verified in the UAV load delivery task,and the effectiveness of the algorithm is proved. |