Font Size: a A A

Analysis And Modeling For Evolution Rules Of Value Distribution And Decision-making Mechanism During Pigeons’ Learning

Posted on:2022-09-26Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z H LiFull Text:PDF
GTID:1528306905994649Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Compared with machine reinforcement learning,animals can use a few samples to achieve efficient learning and adaptively change their activities in dynamic environments to make optimal decisions in a complex situation.The study of the brain mechanism in animal learning and decision-making contributes to the development of more efficient reinforcement learning and intelligent decision-making theory.In essence,animal learning is motivated by acquiring the returned value,but the mechanism of how the brain encodes value,dynamically adjusts value,and efficiently uses the value to make decisions in the animal learning process is still unclear.Therefore,in-depth research on the evolution of value distribution in the animal learning process,the analysis of its dynamic decision-making mechanism,and on this basis to establish a more efficient brain-like reinforcement learning algorithm and decision-making model,have significant research contributions and wide practical application value.To solve the above problems,the pigeon is chosen as the model animal in this study because it is a typical animal for cognitive learning.The striatum(ST),the key encoding region of the value,and the Nidopallium Caudolaterale(NCL),the key encoding region of the behavioral decision,are pinpointed as the target area for electrode implantation with the anatomy of the brain structure technology.The method of animal behavior learning and multichannel neural electric signals analysis are combined to deeply analyze the value distribution evolution law and decision-making mechanism in the pigeon learning process.The main work includes: first,the experimental paradigm of value learning and reversal learning was designed to describe the dynamic learning process of pigeons.Multi-channel local field potential signals were used to construct the functional connection network,and its clustering coefficient was extracted as the neural encoding characteristic to represent the value.The value distribution and evolution of ST in the learning process were analyzed.Then,a multi-value comparative choice experiment paradigm was designed to reflect how pigeons use the value to form decision,and analyzed the neural coding feature distribution and the evolution of NCL in pigeons’ dynamic decision-making process.Inspired by the value learning and the brain decision mechanism of pigeon,a dynamic decision-making method based on value distribution incremental learning algorithm was proposed,and its advantages in solving non-stationary Gaussian return problems were verified by multi-arm bandit problem in reinforcement learning.Finally,a dual error modulation reinforcement learning decision model based on value distribution representation was established based on the above dynamic decision method,which showed the superiority of the model in solving classical reinforcement learning problems.The main innovative achievements are as follows:1.The evolution law of value distribution of ST in pigeons’ learning process was revealed,and it was found further that the changes of the neural encoding characteristic distribution in NCL during multi-value comparative choice effectively reflected the change of pigeons’ behavioral decision.For the value learning,the mean of value distribution of ST would move toward to the specific direction corresponding to the returned value with a decreasing variance,and the change from the ‘learning’ to the‘acquired’ stage was caused by a threshold.When the statistic characteristic of reward was reversed,the value distribution would move reversely.In the process of multi-value comparative choice,the variation of the distance between different behavior value distributions in NCL could reflect the decision-making mechanism of how pigeon dynamically switched between the exploration and exploitation of values.Namely,a larger overlap between the value distributions of different actions led to higher randomness for the action choice while a more significant separation between the value distributions of different actions led to a more obvious preference for the action choice.2.A new distributional value incremental learning based on the variance inference expectation maximum(DVIL_VIEM)was proposed.Referring to the evolution law of value distribution in pigeon’s learning process,the mean,the cognitive variance of value distribution and the environmental reward variance were adaptively updated and iteratively by using incremental learning with new samples.It was the first time to establish the learning stop and restart rules based on value distribution changes,which achieved the efficient learning of value distribution and the adaptive restart learning when statistic characteristics of reward changed.3.A new dynamic decision-making method based on the distributional representation of values(DDDRVs)was proposed based on incremental learning of value distribution and gating with the distance between different ones.Inspired by the dynamic decision-making mechanism of pigeon’s brain,the above DVIL_VIEM algorithm was used to update the value distribution of the selected action based on the Bayesian inference framework.When the distance between value distributions of different actions was greater than a given gating threshold,the Thompson sampling policy was switched to the greedy policy to speed up the formation of the optimal policy.It provided a more efficient way to solve the balance problem of exploration and exploitation under the condition of limited sampling in reinforcement learning.4.A new dual error modulated Actor-Critic(DEMAC)reinforcement learning decision model based on value distribution was proposed.Based on the collaboration mechanism between the value encoding and decision-making brain regions of pigeons,the form of information interaction and optimization target between actor and critic were improved.Value distribution and policy distribution were updated jointly according to the action prediction error and the value prediction error.In the learning process of value and policy distributions,the dynamic detection mechanism in the‘learning’ and the ‘acquired’ phases were embedded,which effectively improved the convergence speed and adaptability of the optimal learning policy.
Keywords/Search Tags:Animal Learning, Value Distribution, Neural Encoding, Reinforcement learning, Dynamic Decision-making Model
PDF Full Text Request
Related items