| Intelligent combat decision-making is a new type of combat decision-making operations based on artificial intelligence methods.Through the introduction of artificial intelligence methods to support the implementation process and results of combat decisionmaking activities,it aims to achieve faster,more stable,and more efficient decisionmaking responses in the field of combat decision-making,and to build an advantage against the enemy through the improvement of combat decision-making capabilities.However,the field of combat decision-making usually has special challenges such as strong confrontation,fast real-time,incomplete information,and uncertain boundaries,etc.Directly using a single intelligent method usually exhibits problems such as low efficiency,poor generalization,and weak robustness,etc.It is difficult to meet the basic abilityrequirements of intelligent methods in terms of adaptability,agility,and ease-of-use in combat decision-making tasks.To this end,the thesis proposes a solution from the perspective of integration,and takes the common Reinforcement Learning(RL)methods in intelligent combat decision-making as the core,and designs a specific integration framework and the corresponding optimization methods.Specifically,the thesis designs an integrated framework of RL methods based on the analysis of requirements for intelligent combat decision-making first,starting from the concept of integrated intelligence,and combining the mechanism of RL technology.In turn,it focuses on the key capabilities required to achieve intelligent combat decision-making,and optimizes the specific methods of each key link of the innovative integration framework to better adapt to the special challenges of intelligent combat decision-making.The thesis mainly carried out the following 6 researches:In Chapter 2,we propose an integrated framework of RL methods that adapts to the operational decision-making environment,which clarifies the key issues that need to be addressed in each link of the integration framework for the key capabilities required to realize intelligent operational decision-making.Aiming at the problem that a single intelligence method is difficult to apply to complex combat decision-making tasks,the thesis,based on the analysis of the basic concepts of integrated intelligence and the integrated mechanism of RL methods,serves the specific requirements of integrated intelligence construction under intelligent combat decision-making.We propose a method to classify and combine RL methods from 4 levels: the architecture scheme,the pattern type,the optimization direction,and the characterization method.Subsequently,based on the review of RL related work under the new classification method,the decision-making characteristics and adaptability of each method are analyzed for the requirements of intelligent combat decision-making ability,and on this basis,we design the corresponding RL integration framework,analyze the key issues that need to be addressed in each link of the integrated framework and delineate the scope of research in subsequent chapters.In Chapter 3,we propose a dynamic hierarchical reinforcement learning(HRL)method to meet the requirements that the architecture scheme in the integrated framework must be able to effectively adapt to different combat missions and environments.The thesis designs a dynamic HRL model structure,and on this basis,we provides a nested exploration and development mechanism to effectively solve the challenges brought by the inter-level dependencies.The method has the ability to dynamically adjust the hierarchical structure adaptively to different environments and can enable the HRL methods to be used as an architectural method in the integrated framework to effectively exert its advantages in dealing with multiple types of complex long-sequence decision-making tasks,while reducing the trial and error cost of the hierarchical structure selection setting and ensuring the overall efficiency of the method.Besides,a distributed training architecture is proposed,and a corresponding adaptive evolution method is designed based on this architecture to accelerate convergence.The experimental results fully prove the effectiveness and performance advantages of the proposed method in adaptively determining the optimal hierarchical structure.In chapter 4,we propose a Gaussian-Process-based planning and learning integrated strategy training method,starting from the innovation of the integration method of the framework mode type layer,and exploring ways to improve the rapid formation of combat decision-making capabilities under the no-plan-condition.In order to meet the basic ability requirements of intelligent combat decision-making tasks,we ensure that intelligent combat decision-making capabilities can still be formed quickly under small sample data without pre-plans,and reduce the high dependence of traditional deep reinforcement learning(DRL)methods on the amount of data.The thesis combines the Gaussian-Process method with good adaptability to small sample problems and the Deep Q-Learning method,and studies a planning and learning fusion strategy training method based on the Gaussian-Process.In addition,a data selection mechanism based on KL divergence is also designed to evaluate the ability of the GP model to describe the environment and the quality of the data generated.The thesis tests the method by taking dialogue strategy training as an example.The experimental results fully demonstrate the effectiveness and robustness of the method,and the task success rate is improved by about20% in performance.In Chapter 5,we propose a continuous action decision-making integrated strategy training method based on Gaussian-Process and proximal policy optimization method to avoid that the integrated framework forcibly transforms the existing continuous-action decision strategy into discrete-action decision strategy in some intelligent combat decisionmaking scenarios,which will significantly affect the training speed and performance.Aiming at the current situation that the existing mode type integration method in the integration framework is still limited to the current situation of the discrete action space,based on the previous part of the research,in order to better deal with the common continuous decision-making problems in intelligent combat decision-making,the paper studies a continuous action space-oriented planning and learning integrated strategy training method,and a loss function combined with the proximal policy optimization method is designed to assist the world model training mechanism.Aiming to verify the effectiveness of the method,in the experiment,the thesis uses two typical autopilot simulators to test the proposed algorithm.The experimental results effectively prove the advantages of the method in terms of convergence,robustness and performance in the continuous action decision-making problem.In Chapter 6,we have studied a RL training method based on time-coded pulse neural networks that is easy to deploy for unmanned equipment.Starting from the representation method level,a new integrated method is provided to support intelligent combat decisionmaking,which can better effect on unmanned equipment.Aiming to meet the energy consumption requirements of unmanned equipment and ensure the real-time and effectiveness of terminal decision-making,the thesis studies a reinforcement learning training method based on time-coded pulse neural networks.In order to solve the problem that the derivative does not exist when the time coding method is applied to the reverse learning task,we design an auto-increment variable,which is introduced into each impulse neuron to make the neuron network completely differentiable.In addition,a recoverable coding method is proposed to solve the problem of time coding input information loss in asynchronous networks.Experimental results show that in the benchmark decision-making task of reinforcement learning,this method can achieve performance equivalent to the existing DRL methods,and at the same time,it can maintain the low power consumption and low latency characteristics of the brain-like method itself.In Chapter 7,we put forward the ”acceptability” principle and its application measures to provide theoretical support for the effective integration of human intelligence and tact at the representation level,and promote the construction of effective deployment and application capabilities in human-centered combat entities and businesses.How to achieve the integration and fusion of human-computer intelligence,and to organically combine human intentions with the formalization of machines to improve decisionmaking tasks is another important issue to ensure effective deployment and application capabilities in combat entities and businesses.In view of the current lack of systematic theoretical research from an integrated perspective in this field,resulting in unclear guiding principles,the thesis first systematically sorts out the relevant research on interpretability theories mainly used in the current machine learning field from the analysis of the characteristics of the representation method.And we also discuss the limitations of human-machine intelligent fusion based on this principle in the operational decisionmaking environment,and then discussed the basic concepts of human-computer intelligent fusion in a combat decision-making environment and put forward the ”acceptability”concept on this basis.It also gives its specific application principles as a new guideline to guide the formation of human-machine intelligence integration in the combat decisionmaking environment. |