Research On Offline Deep Reinforcement Learning Algorithm Based On Truncation Error

Posted on:2023-04-07

Degree:Master

Type:Thesis

Country:China

Candidate:Y Z Ou

Full Text:PDF

GTID:2558306629474804

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Offline deep reinforcement learning algorithm combines traditional deep reinforcement learning with offline learning,which is one of the research hotspots in the field of machine learning.The offline algorithm learns from the offline data set obtained from a series of task interactions.This feature has high application value in the fields of robot technology and automatic driving technology.Because the offline data set usually can’t contain all state action pairs,the offline algorithm inevitably has overestimation of action value,model deviation and unstable effect.Aiming at the above problems,this thesis mainly works the following three aspects:i.In reinforcement learning,plot cumulative return is a complete evaluation of a series of actions of agents.The traditional experience playback method does not consider the impact of plot cumulative return on network training.The priority based experience playback method reduces the training efficiency of the algorithm to a certain extent because it needs to update the priority of experience samples in each stage of network training.To solve the above problems,considering the storage process of experience samples and taking the plot cumulative return as the basis of sample classification,a depth deterministic strategy gradient algorithm based on plot classification experience playback is proposed.Experiments show that the algorithm performs well in a variety of continuous control tasks by making efficient use of the past successful experience.ii.Meta learning method.The offline algorithm of truncation error has the problem of incomplete distribution of training data,so that the necessary state action pairs or access times are missing in the training process,resulting in the unstable results of experimental training.The algorithm depends on the distribution of offline data sets.To solve this problem,an offline deep reinforcement learning method based on meta learning method is proposed.An initial network parameter is constructed through meta learning method to improve the network adaptability and learning ability and alleviate the deviation of strategy network model.The effect of the algorithm learned from various data sets is stable.The experimental results in continuous control tasks show that the algorithm has better robustness.iii.Select the playback method according to the classification of historical actions.The mainstream optimization method of offline algorithm is to limit the action selection through the network model,so as to control the distance between the behavior strategy distribution and the target strategy distribution.Through this method,the generation of truncation error is also known as controlling the generation of extrapolation error.Inspired by this method,from the perspective of controlling the sampling process of offline data set,a deep offline reinforcement learning method of historical action classification and playback is proposed.This method improves the performance of the algorithm by improving the traditional empirical playback method in the offline algorithm.The offline data set is divided into two parts:historical action priority data set and original data set.The training process fully balances the relationship between exploration and utilization,and truncates the error from the perspective of empirical playback.It makes up for the random and blind characteristics of offline depth enhancement algorithm in empirical selection,makes the algorithm obtain comparable training results,and provides a new idea for the optimization direction of offline algorithm.On the basis of off-line reinforcement learning,the above three aspects optimize the problem from different angles around the problems such as overestimation of action value and model deviation in off-line learning,and can achieve good experimental results.

Keywords/Search Tags:

Reinforcement learning, Offline deep reinforcement learning, Classification experience playback, Overestimation error, Meta learning

PDF Full Text Request

Related items

1	Research On Improvement Method Of Experience Playback Mechanism In Deep Reinforcement Learning
2	Research On Three Key Problems In Reinforcement Learning
3	Research And Implementation Of Deep Reinforcement Learning Algorithm Based On Offline And Online Mixed Strategies
4	Optimization Method For Reinforcement Learning Based On Overestimation Control And Exploration Enhancement
5	Research On Overestimation And Safety In Reinforcement Learning
6	Improvement And Research On Progressive Algorithm For Beinforcement Learning
7	Research On Sample-efficient Deep Reinforcement Learning Methods
8	Research On Manipulator Grasping Method Based On Reinforcement Learning And Meta-learning
9	Research On Security Deep Reinforcement Learning Based On Experiences
10	Research On Uncertainty-weighted Offline Reinforcement Learning