| This work constructs a fully-communicated homogeneous multi-agent system based on Decentralized Markov Decision Process.The coverage path planning of multi-agent system usually needs large resource consumption with high repetition rate.To solve the problem,a series of event-based heuristically accelerated reinforcement learning(HARL)algorithms are proposed.The content of this work is as follows:Firstly,the related theories of HARL is introduced.Finding optimal control policies using reinforcement learning algorithm can be very time consuming.Therefore,heuristic function is proposed to combine with Q-learning for example,to accelerate the learning process.As for multi-agent system,the visits in the huge strategy space are reduced according to the guidance of heuristic information,so that the learning process can be accelerated as well.Secondly,multi-agent coverage effected by various heuristic function is studied.Different definition of heuristic function would constitute different kind of HARL.Event-triggered mechanism is used to improve the three HARL algorithms and optimize the result of multi-agent coverage.As the acquisition of priori knowledge in Heuristically Accelerated Q-learning(HAQL)is unsure,an event-triggered multi-agent HAQL algorithm is designed.It can avoid either the lack of information or redundant computation in the structure extraction stage.The construction of the heuristic is triggered by the degree of priori knowledge acquisition.The experiments of multi-agent coverage shown this algorithm could accelerate coverage process and save computing resources while guaranteeing the optimal strategy.Due to the subjectivity of the cost function,it’s difficult for Heuristically Accelerated State Backtracking Q-Learning(HASB-QL)to find the optimal strategy.An event-triggered mechanism is used to judge the observation of agents themselves.When the observation changes a little,agents adopt HASB-QL algorithm to simplify the joint action choice and accelerate the learning process.When the observation changes a lot,agents adopt Q-learning algorithm instead to search new strategy.The experiments of multi-agent coverage shown this algorithm could suppress the negative heuristic of cost function.Multi-agent coverage realized with less consumption and low repetition rate.For Case Based Heuristically Accelerated Q-Learning(CB-HAQL)algorithm,inaccurate or incomplete case base may bring wrong heuristic to learning process.To improve this problem an event-triggered multi-agent CB-HAQL algorithm is designed to control the learning process in two aspects.On the one hand,the procedure for building and updating case base is triggered by the strategy of every episode.On the other hand,the retrieve of case is triggered by the similarity of current state and the case.The results of multi-agent coverage proved that this algorithm reduced resource consumption and repetition rate of coverage.Event-triggered mechanism improved the heuristic suggested by case base. |