| In recent years,the massive online open courses have been widely used by people because of the advancement of technology,and the convenience and largescale embedded courses of MOOC platforms have attracted millions of users to take courses on these platforms.However,not all users are able to complete the courses they enrolled in,and high dropout rates have been a persistent problem for MOOC platforms,which has adversely affected their development.Therefore,in order to reduce the dropout rate,the behavior of enrolled user needs to be analyzed to detect their tendency to drop out of courses as early as possible,which can provide a reliable basis for adopting appropriate strategies to reduce the dropout rate.In this study,an enrolled user who don’t access the course within a week will be considered as dropout,and the study focus on two parts.The first part is to explore the relationship between users’ learning behaviors and dropout,and the second part is to construct a dropout prediction model to predict the dropout of enrolled users by training their learning behavior data.The object of this thesis is the open datasets of MOOC platform,which contains the data from the largest MOOC platform "Xue Tang X" in China.In the study of the dropout analysis of MOOC platform users,different ways of feature construction and the choice of training algorithm will cause different prediction results.In sum,the main research done during this study includes the following parts:(1)Integrated MOOC dropout prediction based on periodical and behavioral transfer featuresAccording to the characteristics of the small dataset of KDDCUP2015 used in the experiment,this thesis proposes an integrated MOOC dropout prediction model based on periodic and behavioral transfer features.Through data exploration and analysis,we figured out that the periodic features and behavioral transfer features of users’ behaviors are effective in improving the performance of the model prediction,so we extract the features by manually extracting,and training multiple machine learning algorithms as base learners,and further processing the features that perform well in the base learners are used to train multiple integrated learning algorithms,and finally the final dropout prediction results are obtained using a weighted approach.The overall model uses aseries-parallel integration approach,and it is experimentally demonstrated that the model predicts better than the traditional feature engineering-based dropout prediction model,and the performance of the model is close to the neural network-based dropout prediction model under the condition of small computing power,this can provide more ideas for studying user learning behavior.Moreover,compared with neural networks,the proposed model is more interpretable,requires less computing power,and more suitable to construct the dropout prediction models for small datasets.(2)Using CFLD-ATTEN Model to Predict MOOCs DropoutBecause the time and labor cost required to extract features manually is relatively high,and the characteristics of different feature interactions and feature temporality also have an impact on the dropout results,so we propose a CFLD-ATTEN model.We use a convolutional neural network to extract features and combine them by a factorization machine,using a long short-term network to extract the temporal features contained in the dataset,and finally a deep neural network is used to predict the final result.In addition,considering that different features have different degrees of influence on the dropout results,we still uses an attention mechanism to calculate the weights of the features.Not only that,we also use the oversampling technique to process the dataset due to the possible impact of data imbalance on the experimental results.Considering that the different ways of processing a small number of samples may have different effects on the model performance,this thesis will use the SMOTE algorithm and the ADASYN algorithm to process the dataset,and the experimental comparison reveals that the performance of the CFLE-ATTEN model outperforms the model proposed at first and other deep neural network-based models,and because the ADASYN algorithm uses different weights for sampling a small number of samples,it makes the model optimization more effective,the F1 and AUC are superior to the other similar models. |