Research On Reinforcement Strategy For Reward Sparseness Proble

Posted on:2024-01-24

Degree:Master

Type:Thesis

Country:China

Candidate:L J Cai

Full Text:PDF

GTID:2568307130472724

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

The goal of reinforcement learning is to maximize cumulative extrinsic rewards.Reward is a source of motivation for improving reinforcement learning strategy,but most tasks often do not have ideal dense extrinsic rewards.Exploratory reinforcement learning and hierarchical reinforcement learning are often used to solve tasks with sparse extrinsic rewards.There are some problems with reinforcement learning methods that use intrinsic motivation to explore.For example,the process of calculating intrinsic reward is too complicated,and most of the methods ignore the role of the state in its episode.There is a problem with the goal-based hierarchical reinforcement learning method and this method blindly selects goals and lacks guidance.In order to better solve the problem of sparse extrinsic rewards,this thesis studies the specific problems in the above methods,and the following works have been done:(1)In order to solve the problems of complicated calculation process of intrinsic reward and the role of a state in its episode being ignored in intrinsic motivation reinforcement learning methods,this thesis makes full use of the role of state in its episode and designs an intrinsic reward function with a relatively simple calculation process.This method does not need to measure the agent’s familiarity with the states.It calculates an intrinsic reward based on the distances between the next state and the historical states in the same episode,and then uses this intrinsic reward to encourage the agent to move away from the currently recently visited old region,while also preventing the agent from looping.This thesis performed experiments in discrete environments with sparse extrinsic rewards.The results show that the intrinsic reward function can effectively improve the exploration ability of the agent,and then efficiently solve the tasks with sparse extrinsic rewards.(2)In order to solve the problem of blind goal selection and lack of guidance in goal-based hierarchical reinforcement learning method,this thesis proposes a goal selection method.This method quantifies the agent’s degree of mastery of the goal as the number of successes of the goal.The higher the number of successes,the better the agent has a mastery of the goal.This thesis appropriately increases the probability of a goal or trajectory with a low number of successes selected,so that the strategy focuses on learning those goals that have not yet been mastered,and reviews the goals that have been mastered in time to avoid forgetting.The proposed method was used for virtual goal selection of the algorithm named hindsight experience replay and this thesis performed experiments in continuous environments with sparse extrinsic rewards.The results show that the proposed method improves the hindsight experience replay algorithm and provides more reasonable virtual goals for it.

Keywords/Search Tags:

Reinforcement learning, sparse rewards, intrinsic rewards, virtual goals, ranking

PDF Full Text Request

Related items

1	Research On Multi-goal-conditioned Method In Reinforcement Learning With Sparse Rewards
2	Research On Intrinsic Rewards For Reinforcement Learnin
3	Towards Design Of Intrinsic Rewards For Sparse Reward Problem
4	Research On Sparse Reward Based On Reinforcement Learning
5	Research On Control Technology Of Visual Servo Manipulator Based On Deep Reinforcement Learning
6	Design And Implementation Of Student Rewards And Punishment System Of Central Conservatory Of Music
7	Multi-goal Reinforcement Learning For Continuous Control With Sparse Rewards
8	Researches On Efficient Exploration Driven By Reward Function
9	Inverse Reinforcement Learning Under Average Reward Criterion
10	Reinforcement learning without rewards