Font Size: a A A

Research On Feature Representation And Clustering Algorithm For Time Series

Posted on:2019-08-06Degree:MasterType:Thesis
Country:ChinaCandidate:H J JiFull Text:PDF
GTID:2370330566472829Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of big data technology,the application of data mining technology in time series has attracted more and more attention,and research results on time series have been applied to many fields successfully.Time series feature representation solves "dimensional disasters" problem effectively by converting time series from high-dimensional space to low-dimensional space.The time series approximate representation sequence after dimension reduction can be better applied to classification and clustering.Time series clustering is one of the most important tasks in time series data mining.Using unsupervised time series clustering algorithms can gather the internal similar sequences group into one class.This thesis takes time series as the research object,discusses its feature representation and clustering methods.Firstly,this thesis adopts time series feature representation method to reduce the dimension of the time series data.Then,uses clustering algorithm to mine potential classification information in the time series.Finally,new feature representation and new clustering algorithm is applied in the music field.Through the analysis of music time series data,the popular direction of music can be dug out.The specific research is as follows:(1)The original symbolic aggregate approximation representation method(SAX)does not consider the morphological trend within the sequence segment and it cannot solve the problem when each sequence segment in the two time series has the same symbol,so the time series symbolic aggregate approximation representation method based on the beginning and end distance(SAX_SM)is proposed.Firstly,the SAX_SM method uses the morphological features of each sequence segment and symbolic features to describe the low-dimensional sequence,this common description solves the high-dimensional characteristic of the time series.Secondly,the SAX_SM method uses the morphological features of each sequence segment to construct the beginning and end distance,and combines symbol distance to propose a new distance metric.The SAX_SM distance metric can calculate the distance when each sequence segment in the two time series has the same symbol.Experiments show that the SAX_SM method achieves the highest classification accuracy in 13 data sets,the SAX_SM method has better classification result.(2)The K-Means algorithm is particularly sensitive to outliers,so a new time series clustering algorithm K-Center is proposed.The K-Center algorithm selects the sequence which is nearest to other time series as the new cluster.The new clustering center is an existed time series which can effectively solve the problem that the K-Means algorithm is affected by noise or outlier sequences in the process of cluster centroids' adjustment and sequence assignment.In addition,the K-Center algorithm pre-calculates the distance between all the time series to realize sample allocation and cluster centroids' adjustment directly.The problem in the K-Means algorithm that the distance metric needs to be calculated in loop is avoided,and the calculation amount of the algorithm is reduced.Experiments show that the K-Center algorithm is 0.043 higher than the K-Means clustering algorithm in terms of Rand Index and is 0.038 higher than the K-Means clustering algorithm in terms of Cluster Accuracy.Therefore,the K-Center algorithm enhances the effectiveness of the time series clustering.(3)Combining with the main research results of the project,the application of SAX_SM method and K-Center algorithm in the music field is discussed by completing the prediction and analysis of the music popularity.Firstly,the high dimensionality of music data is solved by SAX_SM method.Secondly,the music data is morphologically divided by K-Center algorithm.Finally,the music data is predicted by combining multilayer perceptron and autoregressive integral sliding average model.Experiments show that SAX_SM method and K-Center algorithm are effective in the field of music time series.
Keywords/Search Tags:Time series, feature representation, distance measure, clustering, music popularity, song playback
PDF Full Text Request
Related items