Font Size: a A A

Research On Robot Behavior Decision Model Based On Autonomous Modulatory Developmental Network

Posted on:2022-02-10Degree:MasterType:Thesis
Country:ChinaCandidate:K YangFull Text:PDF
GTID:2480306323994869Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Robot behavior decision-making is one of its most basic and important functions,and it is also a prerequisite for more advanced functions.Although traditional behavioral decision-making methods have been widely used in life,they still have various shortcomings,such as the long training time of neural networks,the need for a large amount of manual annotation data,and the lack of autonomous learning capabilities.And deep reinforcement learning often has problems such as slow convergence speed.In order to pursue further development,researchers began to study biological brain decision-making.The main research method is to combine biological neural and mathematical modeling to build a bionic intelligent decision-making model,which can solve many problems of traditional machine learning methods in the same application field.The most extensive application scenarios of brain-like behavioral decision-making methods are in the fields of mobile robot navigation and robotic arm motion planning.Through the study of the physiological structure of some areas of the brain and the mechanism of brain memory,this paper carries out further structural design based on the developmental network model to improve its decision-making efficiency and apply it to the field of mobile robot navigation.Firstly,when the traditional developmental network performs a specific task in a certain environment,it often needs to manually produce a large amount of annotation data,and when the effect of decision-making is reduced due to the change of the environment,the network cannot continue to learn.In response to this problem,simulating the logical structure of memory classification in the brain and the memory conversion mechanism between the hippocampus and the prefrontal lobe,this paper improves the original developmental network model and designs a behavioral decisionmaking calculation model that can continuously learn incrementally.On the one hand,the new model adopts a semi-supervised method for learning,which not only reduces the need to manually produce a large number of labeled samples,but also improves the convergence speed of the network's decision-making when performing tasks.On the other hand,using the long and short-term memory conversion mechanism in the offtask process can be targeted for autonomous learning in a specific environment.Even if the environment changes,the network can independently adjust its own decisions and gradually converge to a stable state.The simulation experiment verifies the incremental learning effect and fast convergence of the model.Secondly,based on the modulatory development network,the model structure is improved in order to make the network model more autonomous.Neural nodes within the network can change dynamically.The network continuously learns independently in the environment without any training samples,and coordinates with each other through various regions to complete the decision-making tasks of the online process.The knowledge learned completely autonomously often has limitations(that is,the data learned is data in a specific environment),which will reduce the adaptability of the agent in the new map environment.In order to solve this problem,the neural activity of the human brain during the memory playback in the resting state is simulated,and the calculation process of the off-task is changed.In this process,the network recalls specific information according to a certain probability and reactivates certain memories.The network generates new information based on the information memorized by the reactivated neurons(Using feature-based data mapping methods to generate new data).This mechanism greatly increases the efficiency of the network to learn new knowledge and makes the network have a more complete knowledge base.Therefore,the network can be more adaptable to the new unlearned static or dynamic map environment.The experimental results verify the effect of this completely unsupervised model in an unknown environment.With the learning of the network,its adaptability in the new environment is also greatly improved.Finally,in order to make the model adaptable to run in a more complex environment(such as a structured obstacle environment),the previous part of the network structure was improved,that is,a representation layer of advanced features was added.At the same time,in order to ensure that the model can still find the optimal path in this complex type of environment,a reverse order replay mechanism of the memory sequence is added in the off-task process of the model.That is,the weight of the output layer of the network is updated by using the reward discount idea in reinforcement learning.In general,the agent performs memory sequence storage and learns environmental information when performing tasks online.Replay the memory sequence in reverse order in the off-task process to find the relationship between the memory segments,so as to know the best way to reach the goal.The simulation experiment results have achieved the expected effect,that is,in the structured obstacle environment,the model can converge to the shortest path through one or two explorations.
Keywords/Search Tags:Hippocampus, memory playback, developmental network, behavioral decision-making, robot navigation
PDF Full Text Request
Related items