Font Size: a A A

Research On Decision-Making Of Beyond-Visual-Range Air Combat Based On Multi-Agent Reinforcement Learning

Posted on:2019-04-17Degree:MasterType:Thesis
Country:ChinaCandidate:W L YuanFull Text:PDF
GTID:2392330611493306Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
With the constant development of air-to-air missiles,modern air combat has entered the era of beyond-visual-range air combat.Any serious and capable adversaries will attempt to avoid short-range engagement and instead seek advantage in superior beyond-visual-range(BVR)capabilities.Air combat battlefield,which is highly filled with complexity,dynamics and uncertainty,is a fierce fair game,where the situation of the battlefield changes instantly.Therefore,it is significant to research decision-making of BVR practically for multi-aircraft coordinated attack.This paper models the problem of decision-making of BVR with partially observation Markov decision process(POMDP),and designs a kind of reward function model by introducing the theory of BVR situation assessment.Based on the analysis of the process and elements of the typical BVR air combat and the multi-agent reinforcement learning algorithm,a decision-making method for the multiplayer air combat is proposed.The main research work of this paper includes:Firstly,the model of decision-making problem of multi-aircraft BVR air combat is established.Based on the analysis of the radar working state and the characteristics of the missile attack area,the models that of radar detection and of missile attack area are established.And the problem of BVR air combat which is describe abstractly as the problem of incomplete cooperation,considering the complexity and dynamics of multi-aircrafts coordination,is modeled with POMDP.According to the analysis of the traditional reward function in the POMDP modeling process,a new reward function model is designed based on the BVR air combat situation assessment theory,which characterized with continuity and accelerating learning convergence.Then,a decision-making method is designed for multi-aircraft BVR air combat.The principle and characteristics of the multi-agent Deep Deterministic Policy Gradient(MADDPG)are analyzed thoroughly.And MADDPG is applied to the complex multi-aircraft air combat scenarios with continuous action and state space.In view of the characteristics of the BVR air combat tasks,the strategy of learning exploration is improved,and the simulated annealing was combined with Ornstein-Uhlenbeck random process to act on the output action of the strategy network,which reduce some useless and inefficient explorations and is conducive to balance the exploration and exploitation of the reinforcement learning system in the learning process.The structure of policy network and value assessment network is designed according to the POMDP model.At last,an algorithm for multi-aircraft combat is presented.Last,the simulation environment is set up,and the simulation experiments of the improved reward function model validation test and the decision-making problem of multi-flight air combat is designed.The simulation environment is built under Ubuntu,and the communication between the intelligent agent decision-making program and the Gazebo simulation environment is delivered by messaging mechanism of Robot Operation System(ROS).Then,the verification experiment under the single-machine confrontation scenario is carried out based on the analysis of the reward function model.The learning model of reward function established in this paper has faster convergence and better performance than the traditional learning model.Finally,in the Gazebo simulation environment,the simulation experiment of BVR air combat is carried out.The improved reward function model and the algorithm proposed is applied to the decision-making problem of multi-aircraft BVR air combat,and compared with the reinforcement learning model using DDPG algorithm training.The results show that the method this paper proposed can effectively solve the decision-making problem of BVR air combat,whose network convergence speed is faster and reward is higher.The results show that the proposed algorithm is improved to a great extent.
Keywords/Search Tags:Reinforcement Learning, Decision-making of BVR, Multi-Agent, POMDP, MADDPG, Reward Function
PDF Full Text Request
Related items