Font Size: a A A

Comparing Clustering Algorithm Using Time-Series Data

Posted on:2020-08-23Degree:MasterType:Thesis
Country:ChinaCandidate:M Y D u a n g r u x T a n Full Text:PDF
GTID:2370330590451977Subject:Statistics
Abstract/Summary:PDF Full Text Request
Data clustering is one of the most popular unsupervised machine learning approaches.Clustering data can help identify the pattern of what seems to be similar data and leads to the best solution for all commercial problems.For example,taxi booking application,customer's data can be clustered to match supply with demand,to detect fraud pattern of an e-commerce transaction or clustering customers in dating application,etc.In order to carry out the best calculation of clustering certain requirement is needed in each method and approach such as the basic assumption of data.When analyzing data with a wrong assumption,it results in low-quality outcomes.So we would like to study and compare this type of data in an in-depth manner.Time-series analysis is used in many future prediction tasks based on previously observed values,mixing cluster analysis and time-series data to serve the initial purpose that researcher would like to share to the public for better understanding of the clustering,researcher would also like following researchers to refer to this work and develop this theory and apply in wider issues in future.In this paper,the focus is on comparing time-series clustering algorithm with financial time-series data,which is common data such as cryptocurrency,exchange rate currency,the Shanghai Stock Exchange(SSE50),and the stock exchange of Thailand 50(SET50).The paper is divided into 4 main parts:1.The paper introduces the importance of data mining,machine learning,and time-series clustering and some related methods,which lays a theoretical foundation for the formal research of this paper.2.The related work part reviewing research which relevance data mining,machine learning,and time-series clustering in many fields,such as bioinformatics,robotics,medicine,chemistry,gesture recognition,speech recognition,tracking,finance,biometrics,astronomy,manufacturing,etc.3.By analyzing the structure of time-series clustering,that consists of several parts,including distance measurement,time-series prototype,a clustering algorithm,and clustering evaluation.the paper set the clustering algorithm in 3 scenarios for each dataset such as hierarchical clustering,partitional clustering with k-medoid,partitional clustering with k-shape and partitional clustering with TADPole.To compare these scenarios of cluster algorithm whether they are suitable for this dataset or not.We use clustering evaluation approaches such as Silhouette index,COP index,DB index,DB* index,and CH4.In the empirical analysis,comparing time-series clustering using 3 scenarios of cluster algorithm for each time-series data set and evaluating clustering algorithm using 5 indices to identify the validity of each clustering algorithm.From research result,the hierarchical algorithm is the most efficient algorithm for unequal length of cryptocurrency series and SSE 50.In another hand,the partitional algorithm is the most efficient for an equal length of exchange rate currency and SET 50.
Keywords/Search Tags:time-series clustering, machine learning, dynamic time warping, cryptocurrency
PDF Full Text Request
Related items