Font Size: a A A

Study Of Time Series Classification Method And Its Application For Education Data

Posted on:2021-12-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:L J YanFull Text:PDF
GTID:1487306350968579Subject:Education IT
Abstract/Summary:PDF Full Text Request
In recent years,in the field of education,education data has been growing exponentially.The research results of education big data mining and analysis play an increasingly important role in the research and practice of education and teaching,it become a new driving force that cannot be ignored in the field of education.Time series data is an important part of education big data.Time series is a series of sampling values of an objective object at different time points and arranged in order of time.In recent years,with the increase of online learning resources and the rise of Internet education,a large number of time series data have been recorded in the field of education.No matter what kind of learning environment classroom teaching or online learning,the information that can be collected and processed by regularly tracking learning behavior in the unit of time is amazing.These large amount of time series data truly record all the important information of each moment in the learning environment.Time series classification is one of the important tasks of time series data mining,and its basic methods can be used in regression and prediction tasks.Time series classification has a very broad application prospect in the field of education.It can be used to mine learners' behavior pattern recognition,predict students'academic performance,analyze the learning situation,and give full play to the advantages of personalized learning support and intelligent assisted teaching,and help the development of education informatization.Time series classification is one of the important tasks of time series data mining,and its basic methods can be used in regression and prediction tasks.Time series classification has a very broad application prospect in the field of education.It can be used to mine learners' behavior pattern recognition,predict students' academic performance,analyze the learning situation,and give full play to the advantages of personalized learning support and intelligent assisted teaching,and help the development of education informatization.Education oriented time series classification method should not only consider the particularity of time series itself,such as high-dimensional,real-time,the existence of random noise and the nonlinear relationship of data elements,but also consider the special needs of data analysis in the field of education.First of all,the effectiveness of the classifier should be considered.The invalid generalization of the classifier will lead to inaccurate learning intervention measures Insurance.Secondly,learning strong discriminant features is very important in the field of education application.In addition to improving the algorithm performance,it can also make education decision makers and teachers understand learning and learning related situations more deeply.Based on the comprehensive consideration of the needs of data analysis in the field of education and the particularity of time series itself,the existing time series classification methods still have some problems to be solved when dealing with the time series data in the field of education.Focus on these problems,this paper mainly carries out two parts of research work.In the first part,aiming at the problems that need to be solved in time series classification,three novel time series classification methods are proposed as follows:The traditional time series analysis method has high requirements on the data sample itself,and needs to meet some assumptions,such as normal hypothesis,stationary hypothesis,linear hypothesis,etc.For the time series in the field of education,the multi-level structure of the research object,the dynamic situation of the data,and the difference of data sampling will increase the complexity of the time series for education.It is difficult for the time series data in the field of education to simply meet the corresponding assumptions.The characteristics of the education data samples cannot be well interpreted.If we grasp the overall characteristics,it is difficult to show some local and detailed characteristics implied in the time series.Aiming at the problem that the global and local characteristics of time series cannot be taken into account,we propose an ensemble algorithm by combining discrete wavelet analysis and shape similarity recognition of time series(DSE).The proposed method combines the advantages of DWT and shapelet method.DSE embeds wavelet transform into shapelet extraction process,and extracts shapelet information from decomposed time domain data instead of original time series.Discrete wavelet transform(DWT)is a kind of multi-resolution feature.Its multi-resolution analysis can decompose the mixed signals of different frequencies interleaved in time series into sub signals of different frequency bands.Each component after decomposition and reconstruction reflects the global characteristics of the original time series in terms of approximation and detail.Considering the association between shapelets extracted from different components,DSE applies the weighted majority voting strategy to decompose the correlation between the time-domain data to weight the prediction results of the base classifier,and then gets the final label.In this process,Monte Carlo method is used to optimize the weight combination to obtain the local optimal value.Experimental results show that this method has good generalization ability on different types of data sets.In the application of education,in addition to good classification accuracy,we also hope to establish an interpretable classifier.Extracting strong discriminative features is an important part of time series analysis in the field of education.Due to the high dimension of time series and the lack of clear features,it is difficult to construct interpretable classifiers.To solve this problem,we propose a novel feature reconstruction method for time series classification,referred to as interval feature transformation(IFT).The IFT uses perceptually important points to segment the series dynamically into subsequences of unequal length,and then extract interval features from each time series subsequence as a feature vector.This kind of interval feature vector can reflect the local characteristics of time series and can be used as the basis for distinguishing time series.The IFT distinguishes the best top-k discriminative feature vectors from a data set by information gain.Utilizing these discriminative feature vectors,transformation is applied to generate new k-dimensional data which are lower-dimensional representations of the original data.In order to verify the effectiveness of this method,we use the transformed data in conjunction with some traditional classifiers to solve time series classification problems and make comparative experiments to several state-of-the-art algorithms.Experiment results verify the effectiveness,noise robustness and interpretability of the IFT.In order to solve the problem that the feature cannot be selected adaptively in the above research,we propose a new time series similarity measurement which is an improved symbolic aggregate approximation similarity measure based on multi feature and vector frequency difference(SAX_VFD).First of all,we automatically optimize the feature combination according to the tightness of lower bound.Using these combination of feature,the original time series can be mapped to the corresponding feature string vector.Then,we improve the distance measurement method in traditional SAX,which measures distance by taking the vector frequency difference as the weight of different feature distances.In order to verify the effectiveness and efficiency of the method,we use 1-NN algorithm to compare the different methods on public dataset.Experimental results show that the proposed method SAX_VFD has good classification accuracy and dimension reduction efficiency.In the second part,the proposed method in this paper is applied to a specific educational scene.And an applied research on Online Learners' participation pattern recognition is carried out.In order to solve the problem that the learning process is ignored by using the evaluation of learning results in the past research on engagement,and it is easy to cause interference to learners,this paper proposed a framework for the application of educational data mining technology to automatically evaluate online learners' engagement.This framework can be used to accurately evaluate learners' engagement in the learning process through massive online learning data.The framework describes the overall process of evaluating engagement.Firstly,using the learning data in online learning management system,clustering algorithm is used to evaluate the quality of clustering to detect the engagement mode of learners.Then,according to the time series behavior data in the learning process,the time series classification algorithm based on interval feature is used to mine and analyze the discriminative engagement characteristics of learners with different engagement patterns in a certain learning environment.On the basis of fully discussing the applicability of the three methods proposed above,the improved interval feature transformation method is used to analyze the log data in KDDcup2015 data set.Based on the analysis of the extracted discriminative features,the corresponding intervention measures are proposed.The experimental results show that the framework can automatically identify the degree of learners' participation in the learning process and extract the discriminative characteristics from different participation patterns It can provide data support for teaching intervention and greatly reduce the cost of learning support services.
Keywords/Search Tags:education, data mining, time series classification, feature extraction, time series representation
PDF Full Text Request
Related items