Font Size: a A A

Efficient Hierarchical Reinforcement Learning Policy Optimization With Fuzzy Rules

Posted on:2022-11-05Degree:MasterType:Thesis
Country:ChinaCandidate:W ShiFull Text:PDF
GTID:2532307169479314Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
Complex adversarial game is characterized by dynamic adversarial,which has huge action space,complex constraints,and it’s difficult to make a plan.Developing an efficient and intelligent task planning method is of great importance.The method should make full use of game process data and search the optimal policy with the most of long-term rewards.At present,the mainstream research to solve the above problems can be divided into two types.One of them is the traditional planning method based on rule reasoning and search optimization.And the other is the trial-and-error strategy search method such as reinforcement learning algorithm.However,the first kind of method is difficult to formalize a precise mathematical model to achieve rapid deployment.While the second one,reinforcement learning,is difficult to be applied in practice due to its low sample utilization and poor generalization and transferring.In recent years,researchers pay more attention to how to make full use of limited prior knowledge and combine trial and error learning methods such as reinforcement learning to solve the problem of task planning in complex confrontation environment.But to our best knowledge,the study of how to realize the feasible combination of prior knowledge and reinforcement learning is still in the primary stage.What’s more,the relevant methods are mainly tested in game environments with closed boundaries and clear rules,which have not been studied in strong confrontational environments with complex rules,dynamic constraints and unknown action space,such as wargame systems.In order to solve this problem,this paper proposes an intelligent decision making framework which integrates knowledge-driven and data-driven methods.The framework is characterized by co-evolution and mutual promotion.The framework realizes the fast combination optimization of rule reasoning and trial-and-error learning methods in the policy space,and it accelerates the implementation of intelligent decision technology in practical planning problems.Based on the above intelligent decision framework,an efficient hierarchical reinforcement learning algorithm with fuzzy rules is proposed,which is a concrete implementation form of the above framework.The algorithm consists of two parts,reinforcement learning algorithm and fuzzy rules inference module,respectively.In the reinforcement learning part,the adaptive off-policy hierarchical reinforcement learning algorithm is adopted instead of the traditional HRL.The algorithm can dynamically adjust the decision-making frequency of the upper-level policy in real time and realize the off-policy training of the hierarchical reinforcement learning algorithm.In the inference module part,fuzzy logic and fuzzy rules are introduced to unify the representation form of prior knowledge and design the trainable inference module of fuzzy rules.Based on the platform of the National Wargame Deduction Competition,a typical air combat scenario is designed and the relevant comparative experiments are carried out.Extensive experimental results show that HFR algorithm has a good convergence speed,and the optimal policy is obviously better than other models.HFR is not sensitive to the number and the quality of rules.In the analysis of the training process data,five kinds of long-term tactics are summarized: autonomous formation tactics,converging attack,usage of maximum attack range,fast maneuvers to avoid attack and consume enemy ammunition.The main innovations of this paper are as follows:(1)A general knowledge-driven and data-driven intelligent decision-making framework is proposed,which forms a solution of mutual promotion and co-evolution of rule reasoning and reinforcement learning methods,and accelerates the speed of policy optimization.At present,the research on how to combine prior knowledge with reinforcement learning is not mature.The traditional planning method is difficult to establish a fine model,achieve rapid occasion requires adjustment,and the reinforcement learning method has the problem of low sample utilization rate,slow convergence speed,poor generalization property.This paper constructs an end-to-end decision framework,which combines the rules reasoning and the reinforcement learning.The inference module leverage the inference results of the fuzzy rule sets and assist the reinforcement learning in making decisions.The mutual promotion of rule reasoning and reinforcement learning is realized and the decision-making ability is improved.(2)The fuzzy rule is introduced,and the fuzzy rule inference module is constructed.The fuzzy representation method of human prior knowledge is designed,and the rule coding form is unified.The lack of unified representation of prior knowledge is one of the obstacles of the traditional rule inference method to be integrated with reinforcement learning algorithm.In this paper,fuzzy logic is introduced to express rules and the representation method of prior knowledge is unified.The membership function of fuzzy set can be differentiated,which makes it organically integrated with reinforcement learning training and realizes co-optimization.(3)The traditional hierarchical reinforcement learning algorithm is improved.The adaptive switch module is designed to realize the dynamic programming of the decision-making frequency of the upper-level policy,and a sample modification technology is proposed to realize the off-policy training of the hierarchical reinforcement learning algorithm.In the current hierarchical reinforcement learning algorithm,the decision frequency of the upper-level policy cannot be dynamically set,which increases the workload of manual tuning.Therefore,in this paper,an adaptive switch module is added to the algorithm to dynamically adjust the decision timing of the upper-level policy according to the agent state and the lower-level policy.The traditional hierarchical reinforcement learning algorithm cannot be trained in the off-policy way,and the sample utilization efficiency is low.In this paper,a sample modification technology is proposed,which can overcome the environmental instability by modifying the historical experience samples,and realize the algorithm’s off-policy training.Finally,experiments on typical air combat scenarios on the platform of the National Wargame Deduction Competition show that the proposed algorithm is superior to other algorithms in terms of convergence speed and optimal policy selection.Based on the analysis of the experiment process data,four kinds of long-term tactics emerged by agents are summarized.
Keywords/Search Tags:Hierarchical Reinforcement Learning, Fuzzy Rules, Intelligent Decision Technology, Inperfect Information Environment, Knowledge-driven, Data-driven, Wargame
PDF Full Text Request
Related items