Font Size: a A A

Empirical Study On Clustering,Dimensionality Reduction And Visualization Of Classic Wine Data Sets

Posted on:2020-01-19Degree:MasterType:Thesis
Country:ChinaCandidate:D XiaFull Text:PDF
GTID:2381330599961026Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
This paper selects the classic data set Wine Data Set as the research object.The data set contains 178 sample data in three categories.Each row of sample data includes measurements of 13 different components.Constructing data of different dimensions using the Wine Data Set dataset.Then use HC hierarchical clustering,K-means partition clustering,DBSCAN density clustering,EM model clustering to cluster these datasets,increase the dimensions,and compare the clustering results.The robustness of the four clustering methods is explored as the data dimension increases.In order to visually and succinctly display and compare various clustering results,this paper designed a "palette" to compare the clustering results in different situations.Introduce the concepts of similarity,Degree of fragmentation and Ideal degree,and determine the main color of each class under the principle of maximum the Numerical value of Ideal degree.At the same time,using the conditional format in Excel to quickly draw a "palette" based on the clustering results,thereby visualizing various clustering results.This paper first introduces the t-SNE(t-Distribution Stochastic Neighbor Embedding)dimensionality reduction algorithm,and then combines the four clustering methods with the t-SNE dimensionality reduction algorithm to cluster the data of different dimensions and compare the clustering results before and after dimensionality reduction.Explore the feasibility of using t-SNE algorithm in combination with four commonly used clustering algorithms.
Keywords/Search Tags:common clustering method, robustness, visualization, High dimensional data, t-SNE dimensionality reduction algorithm
PDF Full Text Request
Related items