Font Size: a A A

Research On Tactical Decision-making Of UCAV Based On Deep Reinforcement Learning

Posted on:2021-01-11Degree:MasterType:Thesis
Country:ChinaCandidate:Z C HuFull Text:PDF
GTID:2392330611999790Subject:Integrated circuit engineering
Abstract/Summary:PDF Full Text Request
With more requirement of UCAV in military applications,major countries also pay more attention to the development of UCAV.UCAVs will play more important role in future intelligent air combat.Air combat fighting environment is complex,and battle field situation changes rapidly.UCAVs are required to accurately be aware of battlefield situations,make appropriate decisions and take effective actions autonomously.With decades of development,air combat decision-making has gone through differential game,expert system and influence diagram,and intelligent methods represented by artificial immune system,genetic algorithm and approximate dynamic programming and so on.In recent years,deep reinforcement learning has made great progress in various sequential decision problems,and has profound impact on the entire AI development.Based on deep reinforcement learning,this paper proposes an intelligent air combat decision-making algorithm,which can help UCAVs to make decisions autonomously in complex combat environments.This dissertation formulates basic elements of reinforcement learning models for one-to-one air combat problem,exact combat states,the optional maneuvering libraries and kinematics rules in three-dimensional space.A reward function is designed for situation assessment,which considers relative angle,height,and speed between the two combating UCAVs.This reward function is used to guide the UCAV to make maneuvers appropriately.Air combat is a sequential decision problem in continuous state space.The general reinforcement learning methods may suffer from dimensional explosion,and make the air combat decision-making impractical.Aiming to this problem,this paper proposes a method for approximate solution of tactical action values in air combat.Combining the end-to-end representative learning ability of deep neural networks,a deep value network is used to approximate the action values.Taking an incremental greedy method in the value network training phase,tradeoff of Exploration-Exploitation in the combat strategy learning is made,which ensures the diversification of the air combat process and the rationality of long-term multi-step decision-making.After sufficient training,the neural network can evaluate the instantaneous changing combat situation accurately,calculate the value of relevant maneuvers,and take appropriate maneuvering,ensuring a highly long-term reward and effective tactical planning ability.In the simulation experiments,an enemy air combat strategy taking Min-Max behavior search rule is designed.Our intelligent UCAV is trained for typical one-to-one air combat mission in 3D space,and the performance of combat results are evaluated.The simulation results show that in air combat,the agent based on deep deep value network decision-making algorithm can learn reasonable combat tactics after certain period of learning,and actively attack the enemy or avoid the threat of the enemy.It has obvious advantages over the traditional Min-Max combat strategy and has certain adaptive capabilities for complex air combat situations.
Keywords/Search Tags:air-combat maneuver decision, function approximation, deep Q-value network, reinforcement learning
PDF Full Text Request
Related items