| With the proposal of the concept of “Internet +”,the online education industry has developed rapidly.As the mainstream form of online education,MOOC is widely favored by the public for its convenience and abundant resources.However,with the rapid development of MOOC,it also faces some difficulties and challenges.Despite the fact that MOOC has a large scale of users,students often drop out during the learning process,limiting its further development.In order to solve the problem of high dropout rate of MOOC,it is of great practical significance to deeply study the behavior data of students and establish the corresponding analysis model.The research content of this paper mainly includes the following aspects.(1)Based on the existing MOOC public dataset,it finds out students’ dropout trends,the main forms of student interaction with MOOC and the provision of curriculum.At the same time,Pearson Correlation Coefficient is used to analyze the correlation between students’ behavior and dropout,and Apriori algorithm is used to mine the association rules of courses.Experimental results show that students’ learning behavior and course selection information have a greater impact on dropout results.(2)A cluster analysis method of student groups based on PCA and K-means is proposed.Ten learning behaviors are extracted from the dataset to construct the input features of the model.After the data is normalized,Principal Component Analysis(PCA)is used to reduce the dimensionality of data,and finally the K-means algorithm is used to cluster the students.The results show that the MOOC student group can be divided into three types: active students,passive students and wait-andsee students.According to the learning characteristics of different groups of students,the paper puts forward some suggestions to reduce the phenomenon of students dropping out.(3)A dropout prediction method based on time series model is proposed.Considering that the behavior data generated from the interaction between students and the MOOC platform has temporal characteristics,this paper constructs the dropout prediction method CNN-LSTM-ATT which based on the time series model.Aiming at the cumbersome feature extraction process of the existing dropout prediction methods and the insufficient use of the temporal characteristics of student behavior data,the model first uses Convolutional Neural Network(CNN)to automatically extract features from the input data of the model,and then uses Long Short-Term Memory(LSTM)processes the temporal information,and finally captures the important information through Attention Mechanism,thereby obtaining a good prediction effect.Compared with SVM,LR,DT,LSTM and CNN-LSTM,the experimental results show that the proposed CNN-LSTM-ATT model has an effective improvement in F1 and AUC. |