| In recent years,with the rapid development of Artificial Intelligence and mobile Internet technology,intelligent education has made great progress,and personalized adaptive learning based on data has become a hot topic in today’s academic and educational circles.As an educational product in the new era,intelligent education has gradually entered the center of the educational stage and entered people’s life.Nowadays,the resources of network information are increasingly large.For the education audience,the massive learning resources make them tired of efficient selection.For the society,it is particularly important to meet the personalized education needs of users.At present,in order to meet the educational needs of users,the recommendation system and recommendation technology are used as personalized educational resource recommendation tools,with exercises and courses as the main content of recommendation to guide users to learn.However,the current recommendation system is mostly based on static methods such as content and collaborative filtering,which cannot capture enough user preference information,and there are problems such as data sparsity and cold startup,and it cannot give real-time feedback and update recommendation strategies according to the interaction between users and the recommendation system.These problems affect the effectiveness and accuracy of the recommendation system.Make it lack of flexibility and dynamic,can not effectively meet the long-term needs of users.At the same time,most of the existing reinforcement learning recommendation systems use interactive feedback data such as click record and commodity click pass rate for reinforcement learning training,and lack consideration of users’ own cognitive level.In addition,when reinforcement learning is used in recommendation projects,the reward function of agents is currently defined manually.However,for the recommendation system,the unified definition of reward cannot accurately reflect the satisfaction of users and fails to take into account the differentiation of rewards.Therefore,it is necessary to personalize the definition of rewards according to the cognitive diagnosis results and other information of specific users.Based on this,this thesis proposes a personalized question recommendation model based on reinforcement learning,which aims to maximize the matching between educational resources and users through the use of technologies such as reinforcement learning,take into account the knowledge level of learners,accurately meet the personalized needs of users,so as to guide their dynamic learning,and this technology focuses on the long-term improvement of learners’ learning effect.Specific work is as follows:(1)Knowledge tracking models such as DKT are used to model the current knowledge state of students,and the entire recommendation process is modeled as a Markov decision-making process,that is,the next recommendation process for students is only related to the current knowledge state of students.At the same time,through the definition of personalized reward,the agent exploration reward is abstracted into the present form of reward in the state of personal knowledge.In conclusion,by organically combining the perceived ability of deep learning with the decision-making ability of reinforcement learning,the recommended method of reinforcement learning is integrated into the learning process of students,so as to achieve the improvement of students’ persistent and long-term knowledge level and learning ability.(2)The empirical replay mechanism is preferentially adopted,and the knowledge of transfer reinforcement learning is utilized to propose a local transfer method based on instance,which is to carry out the recommendation sequence transfer based on instance for target students,that is,to selectively reuse the instances of migration or trajectory samples collected from the source task.For this,the optimal recommendation sequence is different.However,there will be a common optimal subsequence corresponding to a specific learning stage,so the whole trajectory is not migrated,but the local trajectory is migrated to improve the initial effect of target task learning or accelerate its learning speed,saving unnecessary environmental costs of online exploration for users.(3)The overall structure of the system and the functions of each module are designed based on the detailed analysis of the different needs of the students and teachers.Using B/S architecture,using Python language and Django framework for system back-end development,using My SQL database,Redis,file system and other ways to store data.According to the detailed design of each functional module,Py Torch,Numpy and other third-party frameworks are used to apply the above key technologies(1)and(2)into the system.Aiming at the above key technologies,a series of experiments are designed and completed.The experimental results show that the recommended model of reinforcement learning combined with user knowledge state can better capture learner preferences and effectively carry out strategic learning.Compared with other recommendation models including reinforcement learning recommendation model,the work in this thesis is more effective and has better performance,which can focus on the long-term improvement of learners’ learning effect.At the same time,the combination of local recommendation sequence transfer method can improve the effect of reinforcement learning recommendation model.Finally,through the functional test of the system,the system designed in this thesis has achieved the expected goal and met the diverse needs of the intelligent education audience. |