Feature Selection Method Of Time Series Based On Classification

Posted on:2019-06-23

Degree:Master

Type:Thesis

Country:China

Candidate:C Zeng

Full Text:PDF

GTID:2370330545986963

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

With the continuous development of social economy and computer technology,time series data is widely used in various fields of life.Time series,as the name suggests,is a set of data sequences sorted according to time sequence.The data is sampled at a given frequency over equal time interval.Time series has the characteristics of large amount of data,high dimensionality of data,and continuous updating of data.Moreover,for multi-variable time series,the variables are time-sequential.All of the above characteristics make the research on time series more challenging.The feature selection of time series is an important research direction in related research.It plays a role in the analysis of time series.Firstly,it reduces the dimension of the original data by eliminating redundant and invalid data and selecting the features which have better performance on classify.Secondly,these selected features will be used as input to the classification model to predict unknown data.In order to effectively select the feature of the time series data,this paper will elaborate our research work from two aspects:(1)Selection of variables for multivariate time series;(2)Feature selection based on shapelets.(1)Multivariable time series brings difficulties to mining due to its own time characteristics,high dimensionality,and the correlation between variables.If we simply use the original variables as the input of the model,it will not only increase the training time,but also have a negative impact on the prediction model.Therefore,it is very important to select variables of multivariate time series.Aiming at this problem,this paper proposes a classification-based variable selection method,which is based on the mean and standard deviation of intra-class distance and inter-class distance.This method proposes a variable evaluation criterion based on intra-class distance and inter-class distance.Firstly,the variables are sorted according to this evaluation criterion.Meanwhile,the redundant variables are eliminated according to the gray correlation coefficient value between the input variables.And then the optimal variable subset is selected.Finally,the experiments on commonly used multivariate time series datasets have validated the effectiveness of the method and improved classification accuracy compared with existing methods.(2)After selecting the optimal subset of variables,feature extraction is required for the time series corresponding to these variables.Because the extracted features also have high redundancy and differences in classification performance,the feature selection is necessary.Moreover,the length of these feature is usually not equal,which makes the feature selection more complicated.In this paper,we propose a feature selection algorithm which is based on hierarchical clustering and using shapelets as feature.This method can select the sub-sequences with good classification performance and filter the redundant sub-sequence features.Firstly,we obtains the feature vector of each class and generates a candidate subsequence based on the feature vector.Secondly,the candidate shapelets are clustered by the method of hierarchical clustering,and then a feature subset is selected according to the results of clustering and the class separability.Experiments on UCR datasets have verified the effectiveness of the method.

Keywords/Search Tags:

time series, feature selection, clustering, class separability, shapelets

PDF Full Text Request

Related items

1	Research On The Extraction Of U-shapelets In Time Series Clustering
2	Research On Multivariate Time Series Clustering Algorithm And Application Of Stock Selection Strategy
3	Research On Time Series Classification Method Based On Combination Shapelets
4	Research On Feature Representation And Attribute Selection Algorithm For Multivariate Time Series
5	Research On Feature Representation And Clustering Algorithm For Time Series
6	Time Series Clustering And Influencing Factors Analysis Of Lake Area Change In Qinghai Tibet Plateau
7	Research And Implementation Of Time Series Prediction Method Based On Feature Fusion
8	A Feature Selection Algorithm For Biological Data Based On Dynamic Iterative Spectral Clustering
9	Research And Application On Interval Time Series Clustering Based On DTW
10	Research On Features Selection Method In Time Series Of Remote Sensing Data In Vegetation Classification