| With the continuous development of deep reinforcement learning,more and more algorithms are emerging.At the same time,the information war is also deepening,so improving the command ability of commanders has become a necessary ability to cope with the future intelligent information age.The intelligent planning of operational tasks refers to the automatic decomposition of the overall operational tasks,the intelligent division and specific arrangement of the operational tasks of each army,and the real-time monitoring,dynamic prediction and emergency planning of the operational tasks during the operational implementation,based on the research and judgment of the objective conditions such as the enemy situation,our situation and the battlefield environment,under the support of the intelligent planning system of operational tasks,according to the intention of the superior.Its essence is based on the transformation of military resources,initial situation and expected situation,including a series of command activities such as battlefield situation analysis,generation of operational concept,generation of operational plan,determination of operational decision and generation of operational plan.In the information war,with the operational characteristics of multiple forces,multidimensional space,and quick versus slow,it is necessary to realize the interactive preparation of operational plans,the optimal scheduling of operational resources,and the automatic control of planning processes based on the intelligent planning system of operational tasks.And the visual presentation and intelligent evaluation of planning results,so as to provide technical support for commanders and decision-makers to accurately and efficiently complete combat mission planning activities.In view of the high cost of command training in the traditional command training system and the problem that the opponent is not intelligent enough,this paper proposes to use the SC2 LE platform,based on the method of reverse reinforcement learning,to independently train the commander’s command decision-making ability,so that it can further optimize the command ability based on the expert’s command strategy and style,and then improve the intelligence of command training.The main work of this paper includes:Firstly,the command action sequence is analyzed and summarized,and the Markov decision process is used to model the command decision of the commander;Secondly,a reverse reinforcement learning method is designed to establish a reverse reinforcement learning mechanism that meets the requirements of command decisionmaking planning,from which feature vectors are abstracted,covering command style,weight,efficiency,damage level,etc;Thirdly,in order to enable the command system to realize the reverse reinforcement learning process as soon as possible,this paper designs a set of expert command data acquisition interface software to determine the command expert’s command process parameter definition,command style,characteristic value,association parameter,learning weight,etc.,and models the action object,on-site environment,and influencing factors according to the acquisition parameters.At the same time,According to the rules,plan the action sequence examples of different command experts,so that they can form a quantifiable and instructive learning action sequence;Finally,in view of the open action interface of SC2 LE platform,which has huge action space,and the difficulty of learning corresponding actions using reinforcement learning methods,the atomic actions provided by the interface are abstracted to generate high-level macro actions for commanders;The effectiveness and feasibility of the algorithm are analyzed through different experiments. |