Font Size: a A A

Reinforce The Neural Mechanism Of Cognitive Control In The Learning Process

Posted on:2021-04-05Degree:MasterType:Thesis
Country:ChinaCandidate:Z Z HuoFull Text:PDF
GTID:2435330605463069Subject:Basic Psychology
Abstract/Summary:PDF Full Text Request
Reinforcement learning is one of many types of learning.It is a kind of learning mode that makes learners improve their learning rate and acquire the learning law through reinforcement.Cognitive control is an essential factor in the process of learning.Only through the rational planning of cognitive control,continuous guidance of behavior,behavior adjustment and behavior monitoring,can they achieve better results and achieve the desired goal well.Effective behavior monitoring includes internal monitoring / error detection and external monitoring / external feedback.However,researchers pay little attention to internal monitoring and dynamic learning performance in reinforcement learning.There is still a big debate in this field: what kind of external reinforcement does reinforcement learning depend on and what is the individual's behavior adjustment strategy? Probability selection task and two-step decision-making task are common experimental paradigms in reinforcement learning research.Probability selection task includes two parts: exercise and test.By testing the rules acquired in the exercise stage,we can examine the relationship between internal monitoring and external feedback in the learning process,and further investigate which stage of individual learning depends on external reinforcement and which nature of external reinforcement.The two-step decision-making task is composed of two stages.It mainly studies the influence of the feedback result and conversion type of the previous trial on the choice of the first stage of the next trial.It can directly investigate the behavior strategy mode of the individual and the performance of cognitive control ability in the learning process.The research results of two-step decision-making tasks are mostly used to verify the two model theory of reinforcement learning,namely,model-based and model-free.The model-based is based on the environment structure and the current target docking behavior of the organism for reasoning and prediction,so as to make the behavior more in line with the needs of the target.It is a flexible and cognitive control system.The model-free refers to the learning behavior based on the experience that has been rewarded before.It is a more economical(less cognitive resource consumption),inflexible(unable to make timely response to environmental change),and automatic system.The two-step decision-making task is to investigate how to use and allocate cognitive resources in learning or behavior decision-making in more complex environment.Therefore,this study explores the neural mechanism of cognitive control in reinforcement learning based on probability selection task and two-step decision-making task.In Experiment 1,the adapted probability selection paradigm was used to test the performance patterns of the approach learners and avoidance learners in internal monitoring and external feedback,as well as the relationship between internal monitoring and external feedback in the dynamic reinforcement learning process.The three components of event-related potentials(ERPs),error related negativity(ERN),feedback related negativity(FRN)and P300,could be used as indicators of internal monitoring,external feedback and behavior adjustment.The results showed that in the early learning period,the approach learners and avoidance learners had greater feedback related negativity effect and smaller error related negativity effect,and the negative feedback was significantly greater than the positive feedback,but in the late learning period,there were greater error related negativity effect and smaller feedback related negativity effect.In addition,the amplitudes of FRN and P300 in the last stage were significantly smaller than those in the first three stages,and the amplitudes of ERN were significantly larger.The above results showed that there was a trade-off relationship between the error related negativity and the feedback related negativity.Behavior adjustment was mainly based on the negative feedback in the early learning stage.In addition,the participants successfully mastered the learning law in the last stage of learning.In Experiment 2,the two-step decision-making task was used to investigate the behavior patterns of individuals in each stage,and the logistic regression analysis was used to determine the behavior models of each stage(model-based or model-free).Two time windows were selected for the analysis of the feedback stage,one was the early time window 260-380 ms,that is,the event related component FRN,and the other was N460-620.The results showed that in the second stage of the four learning stages,it was only a model-based model,and the third stage was neither a goal-based model nor a model free model.The first stage and the third stage were model-free behavior pattern.The results of event-related potential showed that the main effect of FRN was only the feedback valence,the negative feedback amplitude was significantly more negative than the positive feedback amplitude,and the main effect of the stage was significant,and the FRN amplitude gradually increased with the backward of the stage.In the later stage,N460-620 not only had significant feedback valence and main effect of stage,but also had second-order interaction between feedback valence and stage.The amplitude of positive feedback wave decreased significantly in block 2 and block 4;and the amplitude of transition type,feedback valence and third-order interaction in stage,less negative feedback in block 3 decreased significantly.This showed that the time of individual information processing would be longer for more complex experimental tasks.Based on the above experimental results,this study drew the following conclusions:1)There was a trade-off relationship between internal monitoring and external feedback.2)External feedback played a role in the early stages of learning.Behavior adjustment was mainly based on the negative feedback of the early learning.3)In the process of learning,model-based and model-free played different roles in different stages of learning.In the initial stage of learning,model-free played an important role,and cognitive control ability was weak.The model-based in the later stage of learning would participate in it,making individual behavior more in line with the needs of the environment,and enhancing cognitive control ability.4)For the more complex experimental tasks,the time of individual information processing would be longer.
Keywords/Search Tags:reinforcement learning, cognitive control, probability selection task, two-step decision-making task, reward
PDF Full Text Request
Related items