| The problem of offensive and defensive confrontation of agents is a classic confrontation problem,ranging from games to decision-making in the battlefield,all of which have shadows of this problem.This thesis takes the offensive task of the combat unit as the research background,and researches the problem of intelligent confrontation.At the same time,we control the real-time decision-making behavior of the agent based on the relevant algorithms of deep reinforcement learning.The research focuses of this thesis are as follows:Model the combat unit of intelligent confrontation,and based on this model,analyze and mathematically define the process of intelligent confrontation.Finally,design and implement an intelligent confrontation system environment.In the dual-agent confrontation scenario,in view of the low utilization rate of reinforcement learning samples in this scenario,a dual-experience pool mechanism is introduced,which separates the successful experience pool and the failure experience pool to improve the learning efficiency of the samples.Aiming at the problem of exploration strategies for different types of actions in this scene,an exploration strategy that mixes OU noise and Gaussian noise is introduced to improve the exploration efficiency of different types of actions.For the sparse reward problem in this scenario,a dense reward function is designed to guide the agent to complete the attack task more efficiently.In the confrontation environment,the correctness and effectiveness of the improved DDPG algorithm are verified by experiments.In the multi-agent confrontation scenario,we have done the following work for the partially observable agents in the process,the instability of the learning environment and the inertia of the agents.We introduce a training framework for centralized training of distributed execution,a bidirectional coordinated neural network architecture,and a reward mechanism that mixes individual rewards and collective rewards.In the confrontation environment,the effectiveness and superiority of the improved DNE-DDPG algorithm compared with other benchmark algorithms are verified by experiments.In the multi-agent confrontation scenario,the idea of hierarchical learning is introduced for the problem of spatial dimension disaster and sparse reward in the process of collaborative decision-making among multiple agents.We divide the process of agent confrontation into a high-level sub-policy selection process and a low-level action execution process.We perform reinforcement learning training on the agent based on the proximal policy gradient optimization algorithm,and training on the selection of high-level sub-policies based on imitation learning.In the confrontation environment,the effectiveness of the hierarchical reinforcement learning method is verified by experiments. |