| At present,mobile robots are widely used in logistics,medical,military and other fields to replace human to complete a variety of complex tasks,which puts forward higher requirements for its intelligence.As the core of mobile robot technology,the scheduling and navigation of mobile robots have always been research focus in academia and industry.Scheduling refers to assigning tasks to mobile robots for the purpose of improving work efficiency or maximizing task revenue.Navigation refers to the mobile robot planning an optimal or suboptimal path from the starting position to the ending position while completing obstacle avoidance behavior.Reinforcement learning is a special learning paradigm of machine learning.It has powerful strategy learning capabilities.It allows the agent to constantly interact with the environment obtaining feedback from the environment,and then adjust strategies according to the feedback.The characteristics of reinforcement learning are of great significance to the realization of autonomous learning of mobile robots,and provide a feasible technical route for task scheduling and navigation.In this paper,the following researches on the scheduling and navigation problems are conducted with reinforcement learning:1)The value function-based method is a kind of classic reinforcement learning algorithm,which is widely used due to its applicability for discrete state action space and low computational complexity.The value function-based method implicitly defines the behavior strategy by using a form to store the state-action value.The agent selects an action,with the optimal value function by greedy strategy while interacting with the environment.So the value function determines the decision direction of the agent.Temporal difference learning is a common method for estimating value function,which has the characteristics of high data utilization and low variance.In this paper,temporal difference learning method is used to estimate the value function of the state of mobile robot,and then the value function is used to realize the scheduling decision of mobile robot.Specifically,the task scheduling problem is solved in an offline estimation and online planning manner.First the value function method is used to estimate the value function,and then the scheduling is transformed into a binary matching problem based on the value function.Finally,the traditional combinatorial optimization algorithm is used to obtain the scheduling strategy.Experimental results show that this method can effectively improve the matching rate and the total task revenue2)Navigation task is a sequential decision problem,and efficient navigation strategy requires mobile robot to make correct navigation behavior in real time.The policy-based reinforcement learning method provides an end-to-end solution to the sequential decision-making problem.Compared with the value-based method,this method of explicitly defining behavior policy enjoys an explicit optimization objective and a stable training process,which is suitable for complex environment.Combining with the proximal policy optimization algorithm,this paper proposes a two-stage learning method,which decomposes the complex learning objectives to makes the training process more stable.The experimental results show that the method can effectively improve the navigation ability and generalization performance.3)Considering that the continuous motion can make the navigation trajectories of mobile robots smoother,and the proposed two-stage learning method may lead to destruction of the model parameters well-trained in the first stage,this paper proposes a dual policy learning framework using deterministic policy gradient algorithm to solve these problems.The framework contains two independent action policies,which are comb:ined linearly before interacting with the environment.Experimental results show that the navigation trajectory generated by this method is more accurate and the training process is more stable. |