Font Size: a A A

Research On Dyna Model Learning Algorithms For Multi-unmanned Platform

Posted on:2019-11-14Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y WeiFull Text:PDF
GTID:2392330611493632Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Multi-UAVs cooperative reconnaissance surveillance has a wide range of applications in the fields of intelligence reconnaissance and battlefield surveillance.How to deploy multi-UAVs in complex environments to perform reconnaissance and surveillance tasks is one of the important issues affecting the future application of multi-UAVs.Reconnaissance and surveillance environments tend to be highly dynamic,uncertain,and confrontational.Therefore,modeling and designing algorithms to implement multi-UAVs to perform continuous reconnaissance and surveillance tasks in such complex environments is a challenging topic.Based on this,this paper mainly studies the following content:(1)Firstly,a model for multi-UAVs cooperative reconnaissance monitoring problems is established.The multi-UAV cooperative reconnaissance surveillance problem is highly dynamic,uncertain and confrontational.The purpose of reconnaissance surveillance is to obtain the maximum intelligence information value within the specified time and minimize the confrontation with possible threats in the environment.Most of the previous studies have modeled reconnaissance and surveillance problems into multi-objective optimization problems,which require pre-planning,but this method is difficult to apply when faced with a dynamic and uncertain reconnaissance and monitoring environment.Therefore,this paper abstracts the multi-UAV cooperative reconnaissance and surveillance problem into multi-agent information collection problem,and models this problem into a Partially-Observable Markov Decision Process(POMDP),which takes into account the dynamics,uncertainty and confrontation of environmental information.And this method is closer to the real situation.(2)Secondly,the algorithm for solving the problem of multi-UAV cooperative reconnaissance monitoring is designed.Considering the urgency of time in the reconnaissance and monitoring environment and the threat of the environment,the designed algorithm should have a faster convergence speed and minimize the number of interactions with the environment to avoid threats.The model learning algorithm in reinforcement learning just fits this characteristic.Therefore,the paper uses the model learning algorithm to solve the modeled POMDP problem.Aiming at the problem that the traditional Dyna-Q model learning algorithm has slow convergence speed,too many state space,and no cooperation among multiple agents,the paper designs algorithms to improve it.First,we add expert knowledge to the learning process.Based on this,the paper designs the Dyna algorithm based on prioritized sweeping and the algorithm based on stochastic dominant heuristic search.Then,the paper uses the tree structure instead of the traditional table method to store state action information,and proposes a Dyna-Q algorithm based on tree structure.This method can not only reduce the amount of storage space,but also build the environment model more effectively and quickly.Information is used in the planning process to greatly speed up the convergence of the algorithm.Finally,based on the tree structure model,the paper introduces multi-agent knowledge sharing technology and proposes a Dyna-Q algorithm based on multi-agent knowledge sharing.In this method,the Agent actively shares the known environmental information to other agents,so that knowledge sharing among multiple agents can quickly construct an environment model,thereby accelerating the convergence speed of the algorithm.(3)The paper designs a simulation experiment of multi-UAV cooperative reconnaissance and surveillance problems.Then we solves the reconnaissance and surveillance problem and verifies the effectiveness of the proposed algorithm.The simulation results show that the proposed algorithm has a significant improvement compared with the traditional Dyna-Q algorithm.The paper verifies the performance of the proposed algorithm by using the cumulative return value obtained by multiple UAVs in 6000 time steps.The larger the return value,the faster the convergence of the algorithm.The results show that the cumulative return value of the Dyna-PS,Dyna-,Dyna-Tree and Dyna-Sharing are 2.9 times,3.4 times,4.7 times,and 6.9 times that of the traditional Dyna-Q algorithm.It shows that the performance of the proposed algorithm has been greatly improved.And the multi-UAVs can obtain more information values in the specified time.
Keywords/Search Tags:Multiple Unmanned Platforms, Model Learning, Dyna Structure, Prioritized Sweeping, Tree Structure, Multi-Agent Information Sharing
PDF Full Text Request
Related items