| Different from industrial robots that work in structured scenes,bionic robots and special robots need to work in uncertain and unstructured environments,which poses great challenges to their motion control methods.Due to the complex structure of the robot,the characteristics of multivariable coupling,strong nonlinearity,underdrive,etc.,and the introduction of collisions,variable structure and other factors in the process of interacting with the environment,traditional nonlinear control methods have problems such as difficulty in modeling,complex theory,and incomplete preset scenes,which are difficult to meet the motion needs of real scenes.Therefore,how to make the robot have the ability of self-learning and motion control has important theoretical value and practical significance.Reinforcement learning can avoid the construction of complex mathematical models through trial and error interaction between robots and the environment,but model-free reinforcement learning methods require a large amount of data interaction,and model-based reinforcement learning methods can avoid this defect due to the existence of models.But the model error in the model-based reinforcement learning method is inevitable,and the reinforcement learning method based on probabilistic reasoning can make up for the model uncertainty caused by the model error while getting rid of a large amount of data interaction.and have better performance.Although the reinforcement learning method based on probabilistic inference improves the model uncertainty caused by model errors,the method uses Gaussian process as the probabilistic dynamics model to make predictions with low efficiency.At the same time,the update of strategy parameters also requires a lot of gradient calculations.Therefore,the calculation performance does not always meet the actual demand.In response to the above problems,this thesis proposes to use K-nearest neighbors to build a local model,and to change the strategy update method to jointly improve the computational efficiency.The main research contents of this thesis include:1.First,in view of the computational efficiency of Gaussian process data training and prediction,the K-nearest neighbor algorithm is proposed to construct a local Gaussian process,and the calculation performance is compared with different improved Gaussian process methods,showing that the local Gaussian process can improve the overall calculation performance.2.As for the large-scale derivation that relies on gradients for policy evaluation,it is proposed to use black-box optimization methods for policy update,thus getting rid of the dependence of policy evaluation on gradient information,and using Gaussian process as the form of strategy to enrich the strategy form,which improves the calculation efficiency of the strategy evaluation3.Perform experimental verification in classic dynamics scenarios,build a simulation environment,and conduct comparative experiments from two aspects of probabilistic dynamics model and strategy update method.Experiments show that the local Gaussian process proposed in this thesis can effectively predict the test input and improve the calculation efficiency,and the strategy evaluation method based on black-box optimization combined with the compound Gaussian process can also greatly improve its overall calculation efficiency.4.Finally,relevant comparative experiments and time performance analysis are carried out on the physical platform of a pneumatic bionic robot with difficult modeling and complicated control(The hardware platform is mainly composed of pneumatic artificial muscles,vision sensors,and Ethernet analog input and output modules.The software environment mainly includes image processing algorithms and reinforcement learning algorithms).The experimental results show the effectiveness of the method in this thesis to solve practical problems,and the time performance analysis shows the computational efficiency of the method in this thesis. |