| The development of intelligent robots brings progress in machine learning algorithms,and mobile multi-robot systems are receiving more and more attention from industries as a pillar industry in line with the development of machine learning.In the actual survey,the choice of robot instead of manual work is gradually becoming the dominant demand.The real-time functions such as collision avoidance and navigation in the robot control system can save operating time,reduce resource loss,and accelerate the rate of task completion.Especially in unknown and complex terrain,robots are equipped with autonomous cruising and target search capabilities to enable the coordination of actions among the various subjects within the system and timely avoidance of obstructing objects.In this paper,the cooperative control research of multi-robot system mainly includes path planning control,formation control,targets ring-around control and containment control.The autonomous navigation problem of multi-robot systems in various environments is studied.By using reinforcement learning to constrain and strengthen internal decision-making,robots can achieve more autonomous performance.Meanwhile,automatic control theory is applied to establish formation motion constraints for regulating basic motion angles and directions,to design a dynamics model for multi-robot cooperative motion,and to enhance its ability to achieve rapid cooperation by reducing velocity input errors to adjust trajectories online to accomplish tracking control of target points and clusters.This paper also focuses on the different motion states of multi-robots in specific contexts,using reinforcement learning to obtain target guidance with the help of potential field information in the field,while avoiding collisions.This paper focuses on the following aspects of cooperative control of multi-robot systems:(1)For the path planning problem,the effect of the environment model with obstacles on the convergence of action decisions is analyzed.A goal-oriented method based on potential field information is designed,and an improved Q-Learning(QL)algorithm based on policy learning is proposed for the reward sparsity problem in the path selection process.Using the characteristics of the reinforcement learning feedback mechanism to improve the reward sparsity problem in the path exploration process,it has higher stability in the dynamic environment compared with the traditional formation path algorithm,and solves the convergence problem of obstacle avoidance and action decision caused by reward delay.(2)For the formation control problem in complex dynamic environments,the leader-follower method is set up to assign task finger terms to robot formations,so that the multi-robot system can obtain higher fitness and task completion rate.Due to the increase of the number of robots,the dimensions of the training data also increase accordingly,an adaptive hierarchical reinforcement learning algorithm is proposed.By decomposing the global optimal path,the sub-regions are reasonably divided.The local optimal paths are designed for the leading robot and the following robot,respectively,to eliminate the spatial dimensionality problem in the system.(3)For the target ring-around control problem of multi-robot formation systems,a distributed idea is used to provide communication support for robot formation collaboration.The robots use a combination of reinforcement learning algorithms to train patterns and kinematic models to explore the optimal trajectory approaching the target point.Since the dynamic target point has the ability to escape,to achieve the ring-around and formation control,a reasonable dynamic target tracking and formation strategy is designed based on the ring-around controller.Handling the formation path conflict between following robots.(4)For the containment control problem of multi-robot formation systems,a reinforcement learning reward function is designed to optimize the containment motion controller.Control protocols are designed at the formation and containment control stages for training multiple robot formation and collaborative control.Using the speed controller to observe speed and state information,output adaptive strategies conducive to fast containment control,construct a finite time convergent Lyapunov function to verify system stability,and analyze the stability conditions for multiple robot systems to achieve containment control. |