| With the continuous progress of intelligent technology,computer wargame has also developed more vigorously.How to control the agents in wargame to realize anthropomorphic confrontation and provide new ideas for the development of tactics and the research of command decision-making has become a hot research direction.In this paper,the rule control method based on logical thinking and the model training method based on reinforcement learning are used to study the dynamic decision-making of agents.Firstly,this paper uses the behavior tree to design the agent based on rule control,analyzes a scenario by using swot-clpv method,and defines the overall decision-making goal of the blue side;Use operational process analysis to decompose operational behavior level by level;According to the behavior decomposition results,the behavior subtree is constructed by using various nodes,and the whole command and control process model is constructed by clustering the subtree layer by layer,so that the combat unit can carry out basic actions in a "anthropomorphic" way.The introduction of behavior tree improves the modularity of the model and facilitates the subsequent development,maintenance and upgrading of the model.Secondly,aiming at the problem of single attack and avoidance path of combat unit controlled by behavior tree model,this paper puts forward the optimization method of spatial discretization.The perception space is separated from the action space,and the grid method is used to divide the main combat area and construct the situation awareness subspace;Using the visible point method,the action space is constructed to assist the movement of combat units,so as to make the battlefield response of agents more flexibleFinally,the reinforcement learning algorithm is used to design the agent,and it is used as an intelligent blue army to check and fill the loopholes in tactical design.By analyzing the scenario,the small-scale combat unit is split,and the agent action decision-making model is established by using ddpg algorithm.At the same time,relying on threat analysis,angle reward and distance reward are set on the basis of score as reward,so as to effectively improve the exploration efficiency.The introduction of reinforcement learning enriches the moving trajectory of the agent,and the combat unit can launch an attack in a more accurate position,so as to improve the hit rate. |