| Deep reinforcement learning algorithms are currently one of the most common ap-proaches for solving complex sequential decision making tasks,where the reward ob-tained from the interaction between the agent and the environment is used to update the policy to maximize the cumulative reward obtained by the agent.This sequential optimal decision making feature of deep reinforcement learning makes it promising to play an important role in autonomous intelligent systems.However,deep reinforcement learning algorithms are facing the problem of low sam-ple efficiency.The trial-and-error mechanism of the algorithm makes it difficult for an agent to obtain successful samples,so a large amount of data cannot really be used for pol-icy optimization.Such inefficient interactions are time-consuming and potentially opera-tionally dangerous in real-world systems.To address this problem,introducing effective priors is a feasible way to improve the efficiency of deep reinforcement learning algo-rithms.Therefore,this thesis focuses on deep reinforcement learning algorithms based on prior knowledge extraction.The main research and innovations of this thesis are as follows.(1)For tasks where humans can provide structured prior rules,this thesis proposes a DQDR algorithm(Deep Q network with Domain Rules)for initializing deep Q networks in reinforcement learning using structured prior rules.The algorithm first utilizes task-general constraint rules or part of the task-related structured prior rules obtained from human experts,and then transfers the structured prior rules into the deep Q network to initialize the Q network.By transferring the prior knowledge to the deep Q-network,the problem that the structured prior is difficult to optimize using the data and the problem that the random initialization of the Q-network makes the algorithm difficult to learn in the initial interaction phase can both be solved.Through transfer learning,the Q-network af-ter initialization with the prior obtains knowledge from the structured prior rules,so that for the subsequent specific tasks,the agent can obtain the approximate optimal policy with a small number of interactions.Experimental results show that the DQDR algorithm proposed in this thesis exhibits better sample efficiency than the PPO algorithm(Proxi-mal Policy Optimization),the DQN algorithm(Deep Q Network)and the Heuristic DQN algorithm(Heuristic Deep Q Network).(2)In tasks where it is difficult to provide structured priori rules,leveraging experts to provide demonstration data is a more common way to obtain a priori.To address the problem of ”catastrophic forgetting” that is difficult to generalize and pre-train the policy network directly using the demonstration data,this thesis proposes the RLBNK reinforce-ment learning algorithm based on bayesian network-based knowledge extraction(Rein-forcement Learning from demonstration via Bayesian Network-based Knowledge).In the knowledge extraction stage,the original demonstration data is firstly abstracted using the state abstraction method based on influence strength,and then the abstracted demon-stration data are used for knowledge extraction using Bayesian networks.The RLBNK algorithm achieves better performance than the DQf D algorithm(Deep Q learning from Demonstration)and the Behavior Cloning algorithm by effectively reducing the policy space required for deep reinforcement learning based on the confidence level of the deci-sion output from the Bayesian network obtained by completing the knowledge extraction.(3)To address the low sample efficiency problem of end-to-end deep reinforcement learning algorithms,this thesis proposes the SEN-DRL algorithm(State Encoder Network for Deep Reinforcement Learning)based on state encoding to obtain significant features of pixel input of deep reinforcement learning.The algorithm first pre-trains the state en-coder network(SEN)by leveraging the high-dimensional state(image)data annotated with low-dimensional physical states of the environment collected in advance as a priori.The trained SEN then processes the input high-dimensional image into a low-dimensional state in the interaction procedure.The SEN-DRL algorithm decouples the two functions of state representation and policy learning in the deep reinforcement learning algorithms,and utilizes the low-dimensional state representation output from the SEN as the input of sub-sequent reinforcement learning algorithms for the policy optimization.The experimental results show that the state edcoding of high-dimensional images can effectively reduce the sample requirement of the deep reinforcement learning algorithm,and the sample ef-ficiency performance of the algorithm is better than that of the compared PPO algorithm,DQN algorithm and DDQN algorithm(Double Deep Q network)and other end-to-end deep reinforcement learning algorithms.In summary,this paper investigates deep reinforcement learning algorithms based on prior knowledge extraction,and proposes the DQDR and RLBNK algorithms that com-bine policy priors and the SEN-DRL algorithm that combines environmental observation priors to address the problem that most current deep reinforcement learning algorithms rely on a large number of interactions with the environment.The experimental results show that the proposed algorithms effectively improve the sample efficiency in the policy learning process and enhance the generalization performance of the policy. |