Font Size: a A A

Research On Prediction Of MOOC Dropout Based On Feature Engineering

Posted on:2021-03-24Degree:MasterType:Thesis
Country:ChinaCandidate:C ZhangFull Text:PDF
GTID:2427330629488205Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
MOOC(massive Open Online Course)is called "Large-scale Online Open Course".Due to its lack of high-quality features such as data sharing,curriculum openness,educational autonomy and lifetime,traditional education industries have attracted tens of thousands of learners worldwide.In addition,it is not limited by time and place.It realizes a learning form mainly based on self-study,makes full use of high-quality teaching resources,and provides students with professional and personalized learning services.The teaching method is a complete and comprehensive teaching system.But at the same time,it also caused a very high drop-out rate due to its autonomy and selectivity,which became the main reason restricting the popularization and development of MOOCs.To solve the above problems,it is necessary to fully understand the daily learning behaviors of learners,statistically analyze the data of learning behaviors,and predict whether they will drop out of school.The analysis of learners' learning behaviors to accurately predict their trends and learning patterns can help teachers and platform managers understand the learning situation of learners and take timely measures to reduce the dropout rate.In this paper,learners who have not studied for 10 days are classified as dropouts,and the principle is defined as whether there is a log record for the next 10 days after a certain point in time.So the study of modeling whether learners drop out is a dichotomous problem.This paper consists of two parts: the first part analyzes the learner's learning behavior and uses feature engineering to extract three forms of features for integration.The second part is the dropout prediction part,which uses six different models to train the extracted features to predict whether the learner will drop out.main tasks as follows:1.Preprocess the data of Xue Tang Online in 2015 and perform a simple descriptive statistical analysis on the data set.By analyzing the Pearson correlation coefficient of the learner's learning behavior data,we finally selected 5 valid behavioral events.Then it proposes three types of features with a total of 111 dimensions: valid event features,quantitative features,and statistical features.These feature sets can reflect the learning behavior habits of MOOC learners from multiple angles,and retain the information of the original data to the greatest extent.2.Three types of single models are used: binary logistic regression,support vector machine,decision tree,three integrated models: random forest,Ada Boost,and GBDT classification models to predict learner dropouts.Each model uses the system's default parameters.,And selected the F value,AUC value and corresponding variance value as the evaluation criteria to compare the prediction performance of the six classification models.The experimental results show that in the prediction of a single model,the prediction performance of binary logistic regression is better,the AUC value reaches 0.8620 and the algorithm takes the least time to run;the F value and AUC value of the SVM model are very close to the binary logistic regression,but The training time is too long,even reaching 0.5h;the decision tree performs poorly on both the training time and the predicted value.In the prediction of the integrated model,the F value of the three integrated models is close to the AUC value,but the GBDT has the best prediction performance.F value is 0.9240 and AUC value is 0.8863,they both have the highest and the corresponding variances are the smallest.A comprehensive comparison shows that the integrated model is significantly better than the single model in terms of training time and prediction value,and GBDT has better performance in the three integrated models.So we finally chose the integrated model GBDT to predict the dropout of MOOC platform learners,which provides an effective path for the related research to predict the dropout rate of MOOC.
Keywords/Search Tags:learning behavior, feature engineering, SVR, GBDT, AdaBoost
PDF Full Text Request
Related items