Reinforcement Learning(RL)is the process of learning how to perform actions in an environment based on different states to maximize the accumulated future rewards.With the help of deep learning,Deep Reinforcement Learning(DRL)has been formed and has shown great potential in many fields.Deep Reinforcement Learning is a current research hotspot with many problems to be solved.In order to solve the problem of poor performance of reinforcement learning on sparse rewards and long-time tasks,hierarchical reinforcement learning decomposes the task into several small tasks and collaboratively trains with the help of multi-level intelligences,but the method has difficulty in learning a good high-level intelligence in a larger observation space;reinforcement learning requires a lot of interaction with the environment in order to obtain more training samples,so distributed reinforcement learning emerges and break this bottleneck with a parallel approach.Specifically,the following problems are identified and corresponding solutions are proposed in this topic.1)In the hierarchical reinforcement learning algorithm based on goal conditions,the high-level action is defined as searching for an observed state in the observation space as a subgoal,while the underlying intelligence takes action to align the current observation with that subgoal.Thus,the subgoal search space of the high-level intelligence is as large as the observation space of the low-level intelligence.This leads to the fact that when the observation space is too large,the search space of the higher-level intelligence increases at the same time.To solve this critical problem,we propose to use a dynamic-awarenessbased representation learning method to embed the observation space into the hidden layer space,based on which a Riemann flow shape optimization method is applied to search for subgoals in this hidden layer space.Experiments show that the method improves the success rate by a factor of 1.5 on average for the Mo Jo Co task of visual observation compared to searching sub-targets directly in the bounded space.2)In solving the sub-target search problem in the hidden space in hierarchical reinforcement learning algorithms using the Riemann gradient descent method,a retraction step is used to restrict the targets to above or near the manifold.This leads to disturbance of the normal gradient.When the coefficients of the retraction are too large,it will affect the convergence,while the coefficients are too small to satisfy the manifold constraint well.In order to eliminate the influence of Riemann optimization on gradient descent,we propose a reparameterization method to solve the Riemann gradient descent problem,which transforms the constraint and then optimizes the subgoal search in the hidden space.Experiments show that the reparameterization has a faster convergence speed compared to Riemann gradient descent.In higher dimensional hidden layer space,the baseline method has failed to converge,while the success rate of the method is still maintained.3)In distributed reinforcement learning algorithms,learners and executors are distributed to different computational nodes to accelerate the algorithm by parallelizing the simulation and training processes.The actuators responsible for generating samples and the learners responsible for training the model need to exchange sample data through an empirical playback buffer,which becomes a bottleneck in distributed systems in largescale scenarios.To solve this problem,we propose to use a distributed decentralized approach to approximate the centralized experience replay buffer by distributing each sub-buffer to each executor node,thus solving the efficiency bottleneck of the centralized buffer structure.We also prove that our proposed distributed buffer is equivalent to the centralized buffer under the assumption of sample independent identically distributed relaxation.Experiments show that the training effect of the framework using 16 actuators achieves the training effect of the Ape-X framework using 32 or even more actuators. |