| With the development of intelligent decision-making technology,deep reinforcement learning algorithms based on artificial neural networks(ANN)have been successfully applied to agent behavior decision-making,but the demand for computing resources limits its use in situations where computing resources are limited.Therefore,taking the impulsive neural network(SNN)with high computational efficiency,strong state representation ability and good biological interpretability as the carrier,the behavior strategy in the discrete and continuous action space of single agent and the cooperative behavior strategy of multi-agent based on SNN is studied,as follows:Firstly,the operation mechanism of SNN is analyzed in depth from three aspects,including neuron calculation model,network topology structure,and pulse coding mode.According to the different applications of reinforcement learning algorithms,the reinforcement learning algorithms based on value function and strategy gradient are introduced respectively.Secondly,aiming at the problem that the conventional behavior strategies designed based on accurate model are not universal,an end-to-end behavior decision-making method in discrete action space based on ANN-SNN is proposed.Among them,aiming at the problem of conversion error in the process of ANN conversion to SNN,the residual membrane potential method is introduced to correct the firing rate of the last layer of pulse neurons,and a method of using least recently used(LRU)mechanism to store experience and reusing experience data based on priority mixed sampling is designed to improve the convergence speed of the algorithm.The simulation results show that the proposed method has better decision-making ability and faster convergence speed than the traditional method.Then,in order to solve the problem of dimensional disaster after the discretization of continuous action,a continuous action spatial behavior decision method based on the spiking action network(SAN)-deep critic network(DCN)hybrid framework is proposed.In view of the long time interval between two adjacent pulses of conventional leaked integral firing(LIF)neurons,the overshoot voltage is used to improve the calculation model of LIF neurons.On this basis,the neuron population encoding method is used to encode the environmental information,and the surrogate gradient function is used to jointly train the two networks.Taking the robot mapless navigation task as an example,the simulation results show that compared with the traditional deep learning algorithm,the proposed method can save the mapping time and obtain a higher navigation success rate,so as to improve the task efficiency.Finally,aiming at the problem that the behavior strategy for single agent design is difficult to be directly applied to multi-agent collaborative decision-making,a multi-agent behavior decision-making method based on prior knowledge is proposed.Based on the actor-critic framework of centralized training and distributed execution,the algorithm introduces prior knowledge into the critical and actor parts of agents by sharing parameters and attention mechanism,which provides effective information guidance for agents in the training process.Simulation experiment results show that compared with traditional deep learning algorithms,this method can effectively improve the decision-making performance of multi-agents in collaborative scenarios. |