Font Size: a A A

Feature Selection Method Of Time Series Based On Classification

Posted on:2019-06-23Degree:MasterType:Thesis
Country:ChinaCandidate:C ZengFull Text:PDF
GTID:2370330545986963Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the continuous development of social economy and computer technology,time series data is widely used in various fields of life.Time series,as the name suggests,is a set of data sequences sorted according to time sequence.The data is sampled at a given frequency over equal time interval.Time series has the characteristics of large amount of data,high dimensionality of data,and continuous updating of data.Moreover,for multi-variable time series,the variables are time-sequential.All of the above characteristics make the research on time series more challenging.The feature selection of time series is an important research direction in related research.It plays a role in the analysis of time series.Firstly,it reduces the dimension of the original data by eliminating redundant and invalid data and selecting the features which have better performance on classify.Secondly,these selected features will be used as input to the classification model to predict unknown data.In order to effectively select the feature of the time series data,this paper will elaborate our research work from two aspects:(1)Selection of variables for multivariate time series;(2)Feature selection based on shapelets.(1)Multivariable time series brings difficulties to mining due to its own time characteristics,high dimensionality,and the correlation between variables.If we simply use the original variables as the input of the model,it will not only increase the training time,but also have a negative impact on the prediction model.Therefore,it is very important to select variables of multivariate time series.Aiming at this problem,this paper proposes a classification-based variable selection method,which is based on the mean and standard deviation of intra-class distance and inter-class distance.This method proposes a variable evaluation criterion based on intra-class distance and inter-class distance.Firstly,the variables are sorted according to this evaluation criterion.Meanwhile,the redundant variables are eliminated according to the gray correlation coefficient value between the input variables.And then the optimal variable subset is selected.Finally,the experiments on commonly used multivariate time series datasets have validated the effectiveness of the method and improved classification accuracy compared with existing methods.(2)After selecting the optimal subset of variables,feature extraction is required for the time series corresponding to these variables.Because the extracted features also have high redundancy and differences in classification performance,the feature selection is necessary.Moreover,the length of these feature is usually not equal,which makes the feature selection more complicated.In this paper,we propose a feature selection algorithm which is based on hierarchical clustering and using shapelets as feature.This method can select the sub-sequences with good classification performance and filter the redundant sub-sequence features.Firstly,we obtains the feature vector of each class and generates a candidate subsequence based on the feature vector.Secondly,the candidate shapelets are clustered by the method of hierarchical clustering,and then a feature subset is selected according to the results of clustering and the class separability.Experiments on UCR datasets have verified the effectiveness of the method.
Keywords/Search Tags:time series, feature selection, clustering, class separability, shapelets
PDF Full Text Request
Related items