| With the rapid development of UAV technology at home and abroad,multi-UAV systems have been widely used in military,police,agriculture,rescue and other fields.As the tasks and environments become more and more complex,how to generate cooperative strategies for multi-UAV systems has become the key of efficient cooperation and successful tasks execution of multi-UAV systems.Based on the theory of probabilistic model checking and reinforcement learning,aiming at the known and unknown probabilistic behavior models of UAVs,this paper studies the synthesis and learning methods of cooperative strategies respectively for multi-UAV systems.The main research contents and contributions of this paper are as follows:(1)In order to ensure that the multi-UAV systems can execute tasks in strict accordance with the commander’s instructions,and give full play to the effectiveness of cooperation,a task planning framework for multi-UAV systems based on probabilistic model checking combined with top-down instruction injection and bottom-up strategy generation is proposed in this paper.The top-down approach is conducive to inject the human commander’s intention or instructions into the agent systems.Starting from the tasks,it focuses on the high-level behavior of the agent systems.However,this approach depends on the design of task instructions and can not fully show the characteristics of the individual agents.The bottom-up strategy generation method starts from the individual ability and mainly focuses on the bottom behavior of the agent systems,but it is difficult to ensure that the strategy generation process conforms to the instruction intention of the human commanders.Therefore,according to the requirements of commander’s instruction injection and cooperative strategies generation for multi-UAV systems,using the advantages and ideas of probabilistic model checking,this paper designs a cooperative task planning framework of multi-UAV systems combining the top-down instruction injection method and the bottom-up strategy generation method.The framework supports commanders of multi-UAV systems to issue semantic instructions in a top-down manner,and uses model checking technology to convert semantic instructions into a standard language that can be understood by computers.After it,a linear temporal logic(Linear Temporal Logic,LTL)specification is generated for each UAV in the multi-UAV systems,in which the requirements for cooperation of multi-UAV systems are included in the LTL specifications in the form of atomic proposition.Then,based on the probabilistic model checking theory,the cooperative strategies are generated for multi-UAV systems in a bottom-up manner to meet the task specifications given by the commander with the greatest probability.The framework can not only ensure that the actions of multi-UAV systems meet the commander’s intention,but also give full play to the effectiveness of cooperation for multi-UAV systems.(2)When the probabilistic behavior models of UAVs are known,an online synthesis method of cooperative strategy for multi-UAV systems based on linear temporal logic task descriptions and receding horizon mechanism is proposed.It can effectively synthesize the cooperative strategies to meet the task specifications with the maximum probability for mul-UAV systems,and can alleviate the problem of state explosion and high computational complexity caused by probabilistic model checking.This paper introduces the receding horizon method into probabilistic model checking to synthesize the cooperative strategies for multi-UAV systems to meet the task specifications with the maximum probability.Firstly,under the task planning framework combined with top-down instruction injection and bottom-up strategy generation,the commander injects the task instructions in a top-down manner and then the instructions are converted into an LTL specification of each UAV in the multi-UAV systems.Secondly,the LTL specification of each UAV is transformed into a deterministic Rabin automaton(Deterministic Rabin Automaton,DRA).Based on the acceptance conditions of DRAs,a task progression metric is defined to show the progression of multi-UAV systems meeting the task specifications.Thirdly,the motion behavior of each UAV is modeled as a Markov decision process(Markov decision process,MDP).By setting two horizons to restrict the running steps of the MDP and the DRA respectively,the finite horizon product automatons and product systems are dynamically constructed.Finally,the cooperative strategies are synthesized based on probabilistic reachability theory and value iteration algorithm for the multi-UAV systems to meet the task specifications with the maximum probability.Through simulation experiments,the effectiveness of the proposed online synthesis method of cooperative strategy for multi-UAV systems based on receding horizon under LTL task descriptions is verified,and the problem of state explosion caused by model checking can be effectively alleviated.At the same time the efficiency of strategy synthesis is improved.(3)When the probabilistic behavior models of UAVs are unknown,a learning method of cooperative strategies for multi-UAV systems based on Actor-Critic reinforcement learning architecture is proposed.And a reward function construction method based on automaton acceptance condition is defined,which can effectively improve the performance of learning algorithm.So that the multi-UAV systems can get the correct cooperative strategies through the interactions with the environment.The traditional methods based on probabilistic model checking depend on the fact that the system models are completely known.When the probabilistic behavior models of UAVs are unknown,it is necessary to combine reinforcement learning technology to learn the cooperative strategies that meet the task specifications for multi-UAV systems.Firstly,in the same way as above,under the task planning framework the task commands of multi-UAV systems are injected and converted into an LTL task specification for each UAV.Secondly,the LTL specification of each UAV is transformed into a limit deterministic Buchi automaton(Limit-Deterministic Buchi Automaton,LDBA).According to the acceptance conditions of LDBAs,a reward function construction method is defined to guide the strategy learning process of multi-UAV systems.Then,the deep neural network is used to fit the value function and strategy function respectively.Finally,the temporal difference algorithm and strategy gradient theory are used to update and optimize the strategy parameters,so that the multi-UAV systems can learn the cooperative strategies to meet the commander’s task instructions.Through simulation experiments,it is verified that the task progression reward function construction method proposed in this paper can effectively improve the performance of the learning algorithm.At the same time,the correctness of the cooperative strategies learnt by multi-UAV systems and the effectiveness of the learning method proposed in this paper are verified. |