| Computer simulation has become the main technical means in the field of artillery fire strike training.In the research of artillery combat drill system based on deep reinforcement learning,because of the large battle space,the battlefield situation tends to be complicated.It will be difficult for computer to store large state tables and thus dimension disaster will appear if only intensive learning is used to obtain artillery agent maneuver strategy.However,it is difficult to converge in the training process and takes a long time to obtain the maneuver strategy of artillery agent by using deep reinforcement learning directly.In view of the above problems,a competitive depth Q network decision-making method based on priority playback is proposed.The specific research contents are as follows.(1)Artillery intelligent decision-making based on deep Q network(dqn)is designed.After discretizing the complex battlefield situation data,dqn algorithm is used to train the neural network.Finally,the artillery agent possesses the basic environment perception behavior,fire perception behavior,maneuver behavior and other intelligent behaviors,thus completes the fire strike drill task.(2)An artillery intelligent decision-making system based on double deep Q network(ddqn)is proposed.The dqn algorithm has the problem of overestimation in training.So ddqn algorithm is used,which improves the calculation method of target Q value on the basis of dqn algorithm.Ddqn algorithm is to separate action selection and strategy evaluation,so as to obtain more stable and effective strategy.Experimental results show that ddqn algorithm can effectively improve the stability of training.(3)An artillery intelligent decision-making method based on dueling dqn-pr deep Q network is proposed.In order to save training time and improve training efficiency,Furthermore,dueling dqn-pr algorithm is proposed to train the neural network.The algorithm uses the average value of the action dominance function instead of its maximum value to solve the Q value.At the same time,the samples with large absolute value of timing error are given higher priority to accelerate the convergence of the algorithm.Experimental results show that dueling dqn-pr algorithm can improve the stability of training while ensuring the performance.The experimental results show that the dqn algorithm can effectively solve the problem of dimension disaster in RL,but the training process is difficult to converge and takes a long time.Using dueling dqn-pr decision-making method,the training time is greatly reduced and the training stability is significantly improved in the artillery agent fire attack test. |