Font Size: a A A

Operational Tasks Learning Based On Multi-info And State Similarities

Posted on:2021-05-11Degree:MasterType:Thesis
Country:ChinaCandidate:X L LiuFull Text:PDF
GTID:2428330605468059Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
In the home environment,robots with multiple operating skills can complete more complex housekeeping tasks and provide users with a better service experience.Existing skills learning methods require a large amount of training data and require high hardware.In particular,the skills already learned cannot be used to accelerate the training of new skills.It needs much training time and is difficult to converge.In order to allow robots to quickly learn different skills in the environment,there are two key issues,one is how to efficiently and concisely represent complex environmental states,and the other is how to achieve knowledge sharing between different operational tasks.To deal with these problems,this paper proposes a fast skill learning algorithm for different operation tasks.The advanced features of the states extracted by the Multi-Info network are used as input data of the skill strategy network.Sharing similar state data achieves the purpose of knowledge sharing,and uses an improved policy gradient reinforcement learning algorithm to achieve fast skill learning of operation tasks.Specific research contents and innovations include:Firstly,network structures of two state representation learning models are designed.For different operation tasks,the state data set is firstly collected in the environment using a randomly initialized skill strategy,and then the state representation learning model based on the Auto-Encoder method is used to extract the common and static features,followed by training the state representation learning model that satisfies five types of prior-knowledge to extract advanced dynamic features of the state.Finally these features can accelerate the training process of the skill strategy model,laying a foundation for rapid skill learning.Secondly,this paper proposes a mixed strategy of multi-feature weights with contextual dynamic information.It combines the static and dynamic features of the two state representation learning models.First,batch standardization of feature data of different models within a fixed time window is performed to obtain uniformly distributed feature data,and then add the reward value obtained after the robot takesthe action in the environment as a paranoid term in the mixed feature,which can increase the weight of the state feature with a high reward value and reduce the weight of the state feature with a low reward value,and finally obtain the mixed feature It can effectively solve the problem of weight between different model features and retain the context dynamic information of the features,making the model training process of skill strategies more rapid and stable.Thirdly,this paper proposes a sample similarity calculation method for efficient knowledge sharing between skill strategies of different operation tasks.The improved policy gradient reinforcement learning algorithm is used to train the skill learning model.First,the manipulator strategy model is trained until converge in one operational task environment.For the training of strategy models for new operating tasks,using features with context information in a fixed time window as the calculation vector of similarity scores.Then add similar sample data generated by the old strategy to the training of the new strategy model to achieve the purpose of knowledge sharing.In summary,the improved policy gradient algorithm based on multi-features and state similarity proposed in this paper solves the problems of efficient feature extraction of environmental states and efficient knowledge sharing between skill strategies of different tasks.It can not only achieve a single operational tasks,such as the skill learning of clicking the static button and the dynamic movement button,and can achieve knowledge sharing between the skill strategies of two different operational tasks.The results of simulation experiments show that the method proposed in this paper can not only accelerate the training of skill strategies for operational tasks,but also improve the overall performance of strategies.Compared with the basic algorithm,it achieves the highest task success rate and average reward value.
Keywords/Search Tags:operational tasks, skilling learning, knowledge sharing, feature fusion, reinforcement learning
PDF Full Text Request
Related items