| With air combat takes up an increasingly important role in modern warfare,the practical significance of research on air combat decision-making methods has also become particularly prominent.Because of the complex and changeable situation of air combat,how to quickly perceive the battlefield environment and generate a favorable and accurate and effective air combat strategy has become an important research direction of air combat game.In the history of air combat decision-making methods,there are some research progresses such as expert systems,influence diagrams,matrix game and differential game,but such traditional methods have problems such as poor adaptability,complex computation and difficulty in meeting real-time.With the rise and development of deep reinforcement learning technology in recent years,it has shown great advantages in solving decision-making problems.This paper proposes an algorithm that combines game theory and deep reinforcement learning to improve the adaptability and intelligence of air combat decision-making methods in game scenarios.In view of the problem of dimensional explosion faced by traditional reinforcement learning algorithms due to the rapidly changing situation of air combat game,as well as the difficulty of predicting opponents’ decisions and the inability to generate effective strategies against opponents,this paper innovatively proposes an air combat decision-making algorithm that combines game theory of Nash equilibrium with deep reinforcement learning Deep Q Network(DQN),namely Minimax-DQN algorithm.Firstly,the algorithm uses neural network to perceive air combat situation to solve the value of fighter maneuvering action in continuous state space,and solves the training instability caused by the correlation between samples by experience replay technique and setting up independent target network.Secondly,In the course of training,the exploration and exploitation strategy εminimax-based on game theory is used to ensure the diversity of training samples and the rationality of continuous decision-making.Thirdly,the Minimax algorithm is used to solve the optimal strategy by linear programming,and the probability distribution of selected actions in mixed strategy game is solved by roulette selection.Finally,after sufficient training,the neural network can output the optimal air combat decision sequence in real time for the opponent’s maneuvering strategy after quickly perceiving the air combat situation,and guide the fighter to win the game by taking advantage of the situation,which has better intelligence and adaptability.In addition,by constructing a football game scene for simulation experiments,it verified the feasibility of the Minimax-DQN algorithm in the game environment and its superiority compared to the traditional DQN algorithm.In the air combat simulation design,firstly,the two-dimensional and three-dimensional close-range air combat problems are described and analyzed,and the dynamic model of fighter is constructed according to the fighter kinematics and dynamics equations.Secondly,analyzing the main factors that affect the air combat situation,designing the fighter flight status and maneuver set,and abstract the Markov game model.Thirdly,aiming at the complex and changeable battlefield confrontation environment,the advantageous situation of air combat is designed,and a complete advantage reward function is set accordingly to guide the fighters to quickly perceive the combat situation and learn to adopt the optimal maneuver strategy.Finally,a two-dimensional and three-dimensional close-range air combat game simulation environment is constructed,training and decision-making evaluation of fighter game confrontation are carried out,and the traditional DQN algorithm is used for comparison experiments in the same environment.The experimental results show that the Minimax-DQN algorithm can intelligently adjust its own situation to avoid risks and occupy a favorable position through self-learning in a complex air combat environment under different initial situations and different strategic opponents.It has achieved great results in the game scene,and better than the performance of the DQN algorithm,showing strong adaptability and sufficient advantages.In addition,it can generate a maneuvering decision within 5ms,meeting the real-time requirements in air combat game confrontation. |