| With the rapid development of robot technology,robots have begun to be widely used in people’s production and life.Legged robots have the characteristics of high flexibility and strong adaptability,and are widely used in field exploration,supply transportation,disaster rescue and so on.Compared with biped and quadruped robots,multi-legged robots have higher degrees of freedom of legs,more stable center of gravity,and better movement flexibility and load capacity.Multi-legged robots have become a research hotspot in the field of legged robots.The traditional multi-legged robots control method generally uses a rhythm controller,but the design of the rhythm controller is difficult and requires an accurate model of multi-legged robots.However,there are many internal coupling situations of multi-legged robots,and the mechanical structure and control system are complex,so it is difficult to establish an accurate model of multi-legged robots.Data-driven reinforcement learning can perform autonomous learning and optimization without an accurate model.In this paper,two multi-legged robots are used as research objects,and use the reinforcement learning theories to study the locomotion control of multi-legged robots.The specific research contents are as follows:1.Aiming at the problems of long algorithm training time,slow data collection speed,and unsatisfactory mobilities of multi-legged robots,an early termination mechanism is introduced,early termination conditions are designed,and the deep deterministic policy gradient algorithm is improved.2.Based on the improved reinforcement learning algorithm,according to the locomotion characteristics of two multi-legged robots,the action space and state space are analyzed and designed;according to the moving targets and different terrain environments of two multi-legged robots,the gait reward functions adapted to a variety of different terrain environments are designed to explore the optimal gait and complete the design of the entire control policy.3.In the simulator V-REP,two multi-legged robots simulation models are built,a variety of different terrain environments are designed,and an improved reinforcement learning method is used to generate multi-legged robots gait that adapt to a variety of different terrains through training.4.Aiming at the problem of difficult sim-to-real of locomotion control policy caused by the gap between simulation and reality,by improving the model accuracy of multi-legged robots,the simulation refresh accuracy,and improving the robustness of locomotion control policy,the sim-to-real of locomotion control policy of multi-legged robots generated in the simulation was successfully completed. |