Font Size: a A A

A Study On The Theory And Methods Of UAVs Decision Making Under Uncertainty Based On The Probabilistic Model Checking

Posted on:2017-03-21Degree:DoctorType:Dissertation
Country:ChinaCandidate:X T JiFull Text:PDF
GTID:1362330569998426Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
How to complete the complex missions for UAVs under uncertian condition,is one of the key techniques to improve the autonomous abilities of UAVs and adapt to the complex combat environment.From one aspect,the traditional comand and control method cannot meet the comand and control requirements of UAVs under complex missions and the low human to UAVs ratio,thus a semantic description method of mission should be studied,which would be close to the natrual language of humans;From the other aspect,the imprecise modeling,environment disturbance,and actuator deviation may lead to uncertianties,such that the traditional deterministic decision-making methed would fall.Therefore,the decision-making method for UAVs missions under uncertainties should also be studied.Based on the framework of Probabilistic Model Checking,this dissertation uses the Linear Temporal Logic(LTL)to describe high level missions and the Markov Decision Processes(MDPs)to model the behaviours of UAV systems,which studies the complex mission decision-making problem for UAVs under uncertain parameters.The main contributions are given as follows:1.For the UAV complex mission decision-making problem that the parameters of MDPs cannot be represented by the probabilistic distribution(severe uncertainties),this paper is the first to propose the robust satisficing decision-making method based on the Info-gap Decision theory,which maximizes the robustness while meeting the desired performance level,and presents the bound of uncertianties over which the policy will fall.First,the Info-gap Decision theory is introduced to describe the severe uncertianties of the parameters of the UAV system,and the Info-gap Decision theory based MDP(IMDP)is constructed.The LTL is converted into the Deterministic Rabin Automaton(DRA),and the probabilistic model checking method is used to sythsize the product IMDP;Then,the local and global monotonic properties between the uncertainty level and the value function are proven.Besides,the robsut optimality theorem and the robust satisficing optimality theorem are given,which provide the theoretical basis for improving the robustness of the policy;Finally,a robsut satisficing decision-making algorithm is proposed,which is used to generate a robust satisficing control policy.And the convergence of the algorithm is given,as well as the falling bounds and the tolarable degree of uncertianties.This method supports the decison making of UAVs in complex missions under severe uncertianties,while reducing the risk of the decison and enhancing the robsut satisficing degree.2.For the UAV complex mission decision-making problem that the parameters of MDPs are unknown in prior(without prior transition probabilities),a modelfree probably approximately correct(PAC)reinforcement learning approach,i.e.,improved delayed Q learning algorithm,is proposed,which is introduced into the framework of the Probabilistic Model Checking and can generate a ?-near optimal policy with respect to the LTL satisfying probability within the polynomial time and sample complexity.First,using the accepting conditions of the DRA,different weights are assigned to the states,which should be infinitely often visited and finitely often visited respectively,and the Rabin weighted product MDP is constructed;Then,the delayed Q learning algorithm is introduced into the policy generation problem for the weighted product MDP.In order to balance the exploration and the exploitation,the safety exploration mechanism is designed to realize the balance between safety and optimality,while avoiding unsafe exploration behaviors;Finally,by maximizing the expected total weights,a near optimal policy is obtained.The PAC property and the convergence of the algorithm are proven,and the effectiveness and the influence of different parameters are verified through simulations.3.For the multi-objective mission decision-making problem of UAVs without prior trnasition probabilities of MDPs,a reinforcement learning based multi-stage decision-making method is proposed under the uncertain parameters(MDP without prior transition probability),which maximizes the LTL satisfying probbaility and minimizes the expected total control cost.First,for the multi-stage and multi-objective optimization problem,different action value functions are designed for different objectives,respectively;Then,considering the coupling relationship between the multiple objectives,the incidence relationship between the value fucntions are modeled;Finally,the Q learning algorithm is proposed for the multi-stage decision-making method,of which the first stage is to generate the maximal restrictive action set of satisfying the LTL,and the second stage is to learn the optimal action with the minimal control cost from the maximal restrictive action set,so that the optimal policy for the multi-objective decison making can be obtained.This method can be extended to more objectives.4.For the cooperative decision-making problem for multiple UAVs under the uncertain transition probability,a cooperative behaviour decision-making method is proposed based on the mission set division and the double finite receding horizon time windows.First,the related mission sets are divided according to the local mission specifications within the horizon h and the atomic propositions of the UAV ability;Then,the intersected automaton within the horizon h is constructed,as well as the product system within the horizon H;Third,since the optimization appoach by satisfying the Accepting Maximal End Component(AMEC)considers the whole LTL specification,which violates the milestones within the horizon h,the task progression metric is proposed to define the progressive function,which will lead the agents to satisfy the milestones within the horizon H one by one;Finally,the equavalent expected total reward problem is constructed,and the behavior cooperative policy is obtained by value iteration.This method can signaficantly reduce the state size of the decision model by mission division and the double finite receding horizon time windows,and improve the time performance of mission decision,which provides a novelty efficient method for the online cooperative decision making for multiple UAVs.
Keywords/Search Tags:Markov Decision Processes, Linear Temporal Logic, Probabilistic Model Checking, Info-gap Decision Theory, Delayed Q Learning, Multi-objective Decision Making, Receding Horizon
PDF Full Text Request
Related items