| With the development of technology,it is obvious that only a single agent cannot handle theincreasing complex problems gradually. As a result,more and more agent working together isrequired for large-scale practical application,which makes the multi-agent technology to attractincreasing attention of researcher,leading to high speed development of this technology during lasttwenty years.As the hotspot in the research field of multi-agent,after great innovation,reinforcementlearning technology is able to provide solutions to search the optimal solution of multi-agentsystems,among which Q-learning is one of the main learning algorithm.Ant Colony Algorithm is a kind of optimization algorithm,which is the analog of swarmintelligence of ant colony behavior.The Algorithm is based on Intelligent behavior of real ant colonystudy,and then theoretical algorithm will be reasoned abstractly,which provides new ways to dealwith many problems.This article introduces the pheromone concept of ant colony algorithm to the multi-agent system toachieve the combination of Q-learning algorithm and the pheromone is incorporated into actioninstructions as well.Relying on the introduction of pheromone,when the entire system is makingdecision,not only the environmental information but also the pheromone and their combined effectare considered,which strengthens the information interaction between agents, and thereforeimproves the learning efficiency of the original algorithm effectively and collaborates better withobjectives.For the combination of ant colony algorithm and for the update strategy of pheromone,improvedant colony algorithm update mode will be adopted.First of all,for pheromone evaporationfactorÏ,self-adaptive adjustment will be adopted,leading to ant colony algorithm of the Q-learningalgorithm combined with self-adaptive adjustment pheromone evaporation factor(APEF_Q),whichwill improve search capability and convergence efficiency of the original algorithm.Finally,computer simulation results from basic stalking problem model shows that the improved newalgorithm performs much better than the old algorithm performance significantly improved bystalking problem model.In addition,for the intensity of pheromone,dynamic time-varying function will replace the constantterm mode,which results in ant colony algorithm of the Q-learning algorithm combined withself-adaptive adjustment pheromone(AP_Q).For new improved algorithm,with more complicated environment three-dimensional stalking problem model,the improved algorithm performance areanalyzed through more difficult computer simulation.Compared with the original algorithm,it turnsout that the new improved algorithm is more reliable and the efficiency of the algorithm enhancesmore obviously.Finally,based on more difficult three-dimensional simulation model,the basic Q-learningalgorithm,the Q-learning algorithm combined with basic ant colony algorithm(Ant_Q) and the twoQ-learning algorithms combined with improved ant colony algorithm(APEF_Q and AP_Q)arecompared through experiment.The experiment result shows the Q-learning algorithm based on antcolony algorithm combined with pheromone performs better than the basic Q-learning algorithm. |