| Robot technology,as one of the most strategically important industry of contemporary,reflects the level of a country's science and technology.And with the development of robot technology,the application scenario is more and more complicated,the traditional fixed homework robot can't meet the production requirements,therefore robot requires to be endowed with more flexible reactivity and more intelligent behavior.The acquisition and generalization of robot motor skills is an important way to endow robots with intelligence,and the acquisition method of motor skills based on framework of imitation learning and reinforcement learning(LfDRL)is the much successful.However,under the conditions specific performance constraints how to autonomously complete new tasks based on the demonstration is a popular research topic.Based on the framework in to represent the policy,conduct imitation learning and optimization for robot intelligent trajectory planning,we in this paper propose a novel method of motor skill learning based on improved local weighted regression(iLWR),policy improvement with path integral and recombination of basis,Since the basis function of classical iLWR-PI~2 method is fixed in the training process and may not be suitable for the new task,we add the self-recombination of the basis function and the double perturbation method of iLWR to make the algorithm learn alternately in the dual space,finally gradually realize the generalization learning from the familiar task to the new task.The research content of this paper shows as follows:Firstly,this paper has studied the research status of robots at home and abroad,and then investigated the related imitation learning,reinforcement learning and the deep learning method combined with the former through a number of literatures.Then,this paper introduces the basic knowledge of robot kinematics,including forward and inverse kinematics,D-H coordinate representation,and discusses the common problems of motion decoupling and joint redundancy in robot research.Behind,based on the study of imitation learning of DMPs-iLWR and DMPs-GMR,we sum up a unified framework of imitation learning.And then this paper comparatively analysis the advantages and disadvantages of iLWR-PI~2 and GMR-PI~2,and propose a new iLWR-PI~2 policy improvement methods based on the alternate learning of double space.this method searches the optimal/suboptimal solution of the task by alternate optimization of weight space and basis space.At last we respectively used SCARA,10 DOF planar linkage,NAO and UR5robot as the experimental platform to verify our algorithm.The first two robots only used MATLAB for simulation,and the latter two were verified by real objects after simulation.The results showed that the algorithm proposed by us had excellent performance. |