| With the rapid development of information and Internet technology,more and more students acquire knowledge and consolidate their learning through learning online.Itemdoing by students is an effective means to test and improve students’ learning effects.Giving a certain degree of reward can motivate students’ enthusiasm for doing items.In this thesis,the process of students doing items and scoring is regarded as a Markov Decision Process(MDP).By studying the setting of the reward function of MDP,the scoring rewards of students in the process of doing items are designed.The main work done in this thesis includes three aspects as following:(1)Analyze the two objective factors that affect students’ scoring in the process of doing items: the difficulty of the items and the time spent on each item,and design the initial reward function;Introduce the subjective factor named students’ confidence index in doing items,which represents the students’ confidence degree in doing an item right,and design a proper scoring rule--"logarithmic" scoring rule to ensure that students submit their confidence index that meets their true level;Based on the difficulty of the items,the time spent on each item and the confidence index,a set of a scoring scheme that motivates students to do items is proposed.(2)In order to speed up the convergence speed of learning and ensure the invariance of the optimal policy of reinforcement learning,a reward shaping scheme based on dynamic potential function is proposed,and through deriving theoretical formulas,this thesis proves that the scheme can guarantee the invariance of the optimal policy,as well as the equivalent relationship between the shaping function based on dynamic potential and the initial reward function;Five groups of students’ item-doing reward schemes,including the scheme proposed in this thesis and four other classical reinforcement learning reward schemes,are designed to do simulation experiments.By comparing the optimal policy and the average steps at convergence obtained under each scheme,the effectiveness of the proposed scheme is proved.(3)Design and develop a student online item-doing system,conducted demand analysis,overall architecture design,and detailed design of several key modules and database tables for the system.The main functions implemented include online assessment of students and acquiring scores with the reward scheme proposed in this article,answer sheet analysis,and basic item management,knowledge point management and user information management functions.In this thesis,reinforcement learning and reward shaping technology are applied to the score reward design for students to do items online,which realizes the effect of testing students’ learning achievements and giving them personalized score rewards by doing items,and provides an opportunity for encouraging students to actively participate in online learning and doing items.Besides,the idea of this thesis plays a good role in promoting the realization of individualized education for students. |