| Since the end of the last century,intelligent unmanned weapons have played a leading role in several major local wars around the world.However,with the explosive growth of artificial intelligence technology at the beginning of this century,the call for enabling the future battlefield with artificial intelligence technology is rising day by day.Indeed,with the rapid development of artificial intelligence and swarm intelligence,a new opportunity appears in the field of intelligent unmanned weapons: the cluster application of intelligent unmanned weapons,which will provide more strategic and tactical options for all levels of confrontation,and will also have a huge subversive impact on the future battlefield.In this paper,based on the background,in view of the heterogeneous multiagent against the problems in the application of reinforcement learning technology research,on the sparse some bonus and some problems existing in view of the problem set,model establishment and program design,the attackdefense in clusters,cluster escort against,cluster alert against task scenarios to the simulation of algorithm,mainly include the following aspects:(1)According to the research problem,the scene model and confrontation rules were designed.In terms of scenes,three typical task scenarios are proposed: attack and defense confrontation,escort confrontation and vigilance confrontation,and the above scenarios are described and analyzed.Then,based on heterogeneous clusters,the types and quantities of task scene components are designed.In order to realize the cooperative strategic confrontation of multitype units,this study designs multiple types of different attribute units such as attack unit,defense unit and detection unit,and sets attribute values with different characteristics for different types.Finally,for three different mission scenarios,the strategy of antagonism and the rule of winning and losing in antagonism are designed respectively.(2)On the basis of the above scenario model,in view of the heterogeneous multi-agent against the problems existing in application of reinforcement learning technology reward data sparseness,put forward the method of local return to reshape,and attack-defense in clusters,cluster escort against,cluster alert against three typical task scenario in heterogeneous multi-agent method has been verified against the policy effectiveness of the process of learning.(3)The reward mechanism expansion method based on local return remolding is very effective.However,due to the excessive reward sparsity problem in this field,the reward system expanded by this method is still obviously insufficient,which will seriously affect the rapidity of the training and learning process.Based on this,this study adopts the experience-first playback technique and verifies the method through offensive and defensive confrontation task scenes.The results show that the experience-first playback technique can better solve the problem of reward sparsity. |