Researches On Improvement Of Fixed Temperature Soft Actor Critic Algorithm

Posted on:2024-05-31

Degree:Master

Type:Thesis

Country:China

Candidate:R Hai

Full Text:PDF

GTID:2568307064985129

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Nowadays,Model-free deep reinforcement learning（DRL）algorithms have been successfully applied to a range of challenging sequential decision making and control tasks.Among them,we believe fixed temperature Soft Actor Critic algorithm（SAC）^[1]has the dramatic increasing on experimental results comparing with other algorithms.However,we found theoretical problems which are in the introduction of entropy term in maximum entropy objective function underling in SAC algorithm theory.Although the introduction of entropy term in fixed temperature Soft Actor Critic algorithm can be used to improve the encouraging-exploration effect of fixed temperature Soft Actor Critic algorithm.It may also cause potential problems in SAC algorithm theory such as optimization deviation and Q value overestimation.So we dig in the theory which causes optimization deviation and Q value overestimation in maximum entropy objective function of SAC algorithm and formulate a modified framework based on them,which perfectly resolve the above problems hidden in maximum entropy objective function.As a result,we establish our algorithm called Constrained Soft Actor Critic algorithm（CSAC）to resolve the problems in SAC algorithm and keep the same encouraging-exploration effect as SAC algorithm do.Although Constrained Soft Actor Critic algorithm can resolve the problems in fixed temperature Soft Actor Critic algorithm perfectly.Constrained Soft Actor Critic algorithm also shows a problem called exploitation bottleneck,which is actually the instability shown in trailing process.So we establish our modified algorithm called Stable Constrained Soft Actor Critic algorithm（SCSAC）to further resolve the exploitation bottleneck underlying in Constrained Soft Actor Critic algorithm,which is actually to improve the stability of our algorithm in trailing process.Further,we find the policy improvement theory in Stable Constrained Soft Actor Critic algorithm has potential problem in finding optimal policy process,so we establish our Further Revised Stable Constrained Soft Actor Critic algorithm（FRSCSAC）to revise this problem.In conclusion,all of our algorithms can resolve optimization deviation and Q value overestimation problems meanwhile keeping the same encouraging-exploration effect as SAC algorithm do.Besides,all of our algorithms have a large amount of theoretical derivation and theoretical proofs to support the establishment of our algorithms.So we believe all of our algorithms are theoretical intact with their theoretical deviation and theoretical proofs.Last but not the least,the training,trailing and Q function overestimation experiments of our algorithms show our algorithms can significantly reduce Q function overestimation appearance meanwhile keeping the approaching results comparing with SAC algorithm.So we believe our algorithms can be easily applied to the real-world applications with appropriate modification.

Keywords/Search Tags:

reinforcement learning, maximum entropy reinforcement learning, Soft Actor Critic(SAC) algorithm

PDF Full Text Request

Related items

1	Research On Inverse Reinforcement Learning Based On Maximum Entropy Theory
2	Research On Deep Reinforcement Learning Algorithm Based On Dual-Agent Cooperation
3	Research On Three Key Problems In Reinforcement Learning
4	Exdloratory Action Correction Algorithm Based On Actor-Critic
5	Position Control Of Hyper-Redundant Continuum Robot Based On Soft Actor-Critic Algorithm
6	Reaearch On Deep Reinforcement Learning Algorithm In Continuous Action On Space
7	Research On Pap Operation Skill Learning Of Manipulator Based On Reinforcement Learning Algorithm
8	Research On Multi-agent System Decision Algorithm Based On Deep Reinforcement Learning
9	Research On Target Tracking Algorithm Based On Deep Learning And Reinforcement Learning
10	The Research On Multi-source Domain Adaptation Analysis Based On Reinforcement Learning