Font Size: a A A

Research On Reinforcement Learning Algorithms Via Recurrent Neural Networks And Recursive Least Squares

Posted on:2022-02-25Degree:MasterType:Thesis
Country:ChinaCandidate:J ZhaoFull Text:PDF
GTID:2558306488492584Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Reinforcement learning is a machine learning method that maximizes expected benefits in sequential decision-making tasks and learns based on interaction with the environment.Reinforcement learning algorithms based on recurrent neural networks(RNNs)often have better decision-making performance in time series tasks through their excellent historical information memory capabilities.However,mainstream deep learning optimization algorithms have disadvantages such as low sample utilization and slow convergence speed,which restrict the development of reinforcement learning algorithms to a certain extent.This research topic focuses on the recursive least squares(RLS)algorithm with extremely high optimization efficiency in linear systems.Based on the efficient solution of model parameters by RLS,it can accelerate the convergence of the weight parameters of the RNNs in the reinforcement learning algorithm model.To reduce the training time of reinforcement learning algorithms and improve the utilization of reinforcement learning samples.The main research work and contributions of the subject are as follows:(1)In recurrent neural networks(RNNs),the first-order optimization algorithms usually converge slowly,andthe second-order optimization algorithms commonly have high time and space complexities.In order to resolve these problems,a new minibatch recursive least squares(RLS)optimization algorithm is proposed,called the RLS-RNN algorithm.This algorithm combines the mini-batch learning mode widely used in deep learning with the RLS algorithm,and proposes a new mini-batch recursive least squares optimization algorithm.Using the inactive linearoutput error to replace the conventional activation output error for backpropagation,together with the equivalent gradientsof the weighted linear least squares objective function with respect to linear outputs of the hidden layer,the proposed algorithm derives the minibatch recursive least squares solutions of RNNs parameters layer by layer.Furthermore,in order to address the adaptive problem of the forgetting factor and the overfitting problem of the proposed algorithm,two approaches are also presented.The simulation results,on the classification and prediction problems of sequential data,show that the proposed algorithm has faster convergence speed than popular first-order optimization algorithms.In addition,the proposed algorithm also has good robustness in the selection of hyperparameters.(2)To solve the problems of high sampling cost,slow convergence and poor stability in the optimization process of the mainstream first-order gradient descent algorithm in the DRQN algorithm,two new DRQN algorithms are proposed by using the RLS-RNN optimization algorithm proposed in this paper.The first algorithm predicts the action-state value of the state by inputting a continuous time state to the RNN structure,and uses the calculated value function to guide the agent to make decisions.The second algorithm uses mini-batch time series samples to update model parameters in the weight update stage by improving the experience replay model,and reduces the correlation between samples through the threshold mechanism,and reduces the vibrating of shocks and non-convergence.The simulation experiment results show that in the decision task,the convergence speed and stability of the two algorithms is better than that of the DRQN algorithm that uses the mainstream first-order gradient descent algorithms.(3)The difficulty of training the critic network in the Advantage Actor-Critic(A2C)algorithm based on RNNs results in the slow convergence of the A2 C algorithm and the high cost of interaction and training.To solve this problem,two new A2 C algorithms based on RLS are proposed respectively.The first algorithm uses the RNN-RLS algorithm to optimize the critic network by synthesizing historical interaction information and current state information in the input stage,so as to reduce the evaluation error of the critic network to the executor network and improve the learning efficiency of A2C;The second algorithm introduces an experience replay model to provide a richer sample for critics’ network optimization.The learnable parameters of the two methods are both updated by the RLSRNN algorithm.Comparative experiments in a simulation environment show that the convergence performance and stability of the two proposed algorithms are better than the mainstream A2 C algorithm optimized by the first-order gradient descent algorithms.
Keywords/Search Tags:reinforcement learning, recursive least squares, recurrent neural network, deep learning, algorithm research
PDF Full Text Request
Related items