| The unmanned combat air vehicle(UCAV)will occupy an increasingly important position in the field of cognitive electronic warfare in the future because of its good stealth,strong maneuverability and other advantages.Because UCAV can attack enemy targets quickly,it plays an important role in advancing the tactics.Path planning to achieve enemy target attacked is the main task of UCAV autonomous flight,however,the existing path planning algorithms take a long time,discontinuous flight motion,and seldom considered the physical constraints,which brings great difficulties to cognitive electronic warfare.Aiming at the path planning problem of UCAV in the dynamic unknown environment,this paper adopts the knowledge of deep reinforcement learning,and obtains the optimal path by establishing the mapping from state to action,which improves the task completion rate of UCAV.The main research contents are summarized as follows:Firstly,the paper makes a general description of the UCAV path planning problem,and clarifies the overall process of UCAV path planning.According to the forces of UCAV in flight,the physical constraints that UCAV needs to meet are set.Based on the modeling of UCAV flight environment,four threat models are given,such radar,missile,anti-aircraft gun and electronic jamming,and the kill probability of different threats to UCAV is analyzed.Secondly,the existing UCAV path planning algorithm can not adapt to the dynamic unknown environment,the knowledge of deep reinforcement learning is introduced into the path planning.Approaching state space with neural network,solved when the state space of UCAV is large,it is very difficult to store the state in tabular form.The states,actions and reward functions of UCAV are designed,and DQN(Deep Q-learning network,DQN)and PG(Policy Gradient,PG)algorithms in the deep reinforcement learning algorithm are used for training.The algorithm is compared from three aspects of mean accumulated reward,average planning steps and success rate,verified the applicability of deep reinforcement learning in the field of UCAV path planning.Thirdly,Aiming at the problem of discontinuous output action of DQN algorithm and slow convergence rate of PG algorithm,DDPG(Deep Deterministic Policy Gradient,DDPG)and SAC(Soft Actor-Critic,SAC)algorithm under the AC(Actor-Critic,AC)framework of DQN and PG algorithm are introduced into continuous UCAV action space.The flight state and action under the condition of UCAV physical constraints is given,and the rewards under different threats are designed.According to four different performance indexes,the performance of the two algorithms in three different tasks is compared.By analyzing the simulation results,the deep reinforcement learning algorithm based on AC framework can significantly improve the task completion rate of UCAV in continuous action space.Finally,The overall architecture of UCAV path planning software in continuous action space is introduced,and modular design process is given. |