Font Size: a A A

Research And Implement Of Time Series Classification Algorithm Based On Random Feature Sampling Method

Posted on:2019-05-12Degree:MasterType:Thesis
Country:ChinaCandidate:F S MengFull Text:PDF
GTID:2370330566998108Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Time series classification is the key part of the study of the time series.In order to improve the accuracy of classification and effi ciency of algorithm,we propose a time series classification method based on feature sampling.Time series dataset may have unequal length and the conventional machine learning and deep learning methods can not be directly applied to time series classification.There are two kinds of traditional time series classification algorithms,one is based on distance and by setting distance functions,find the most similar time series with the query.The typical algorithm is represented as DTW,the other is based on feature feature extraction,the main feature information is extracted from the time serie s and the representative method is the shapelet method.However,the above two methods need a lot of computation.Therefore,how to design a simple and efficient feature extraction method,so as to combine the time series problem with the machine learning method is the core research of this paper.Feature sampling method is adopt to transform the unequal length time series into the dataset with the equal length.We design three sampling methods,simple random feature sampling,equal time interval sampling method and segment random feature sampling method.Compared with the shapelet feature extraction,the feature sampling method is easy to compute and the feature extraction of the original time series is nondestructive.At the same time,for the parameters included in the feature sampling algorithm,we use the improved cross validation method to determine the best parameters and improve the classification accuracy.In this paper,LSTM neural network classifier is chosen.LSTM classifier has a certain memory characteristics.Therefore,LSTM classifier has a congenital advantage for time series classification.We use the dataset after feature sampling as the input of the LSTM and the output of LSTM is used as the input of the softmax classifier.At the same time,we propose the incremental learning algorithm for time series classification algorithm,so that the model can adapt to the new data.We test our algorithm on UCR datasets and the experiment results show that the algorithm can achieve high accuracy on most datasets.With the rise of the Internet,the data volume of time series datasets explode.However,since most time series classification algorithms are completed in memory,they are not suitable for dealing with massive time series datasets.In ord er to solve the Problem of massive time series classification,this paper proposes a parallel algorithm of SFSC_MR(Segment Feature Sampling Classification Using Map-Reduce).In this paper,we define the bitmap representation for time series.We use the bitmap representation to distribute the massive data.In the Map-Reduce procedure,the data is divided by the bitmap to guarantee the load balance.The algorithm is divided into two stages,the preprocessing stage and the query stage.The preprocessing stage is responsible for the calculation of the time series bitmap and training the model.In the query stage,the time series is classified.The experiment result show that the SFSC_MR can deal with the massive data effextively.
Keywords/Search Tags:feature sampling, time series, classification, massive data, Map-Reduce, bitmap
PDF Full Text Request
Related items