Empirical Study On Clustering,Dimensionality Reduction And Visualization Of Classic Wine Data Sets

Posted on:2020-01-19

Degree:Master

Type:Thesis

Country:China

Candidate:D Xia

Full Text:PDF

GTID:2381330599961026

Subject:Probability theory and mathematical statistics

Abstract/Summary:

PDF Full Text Request

This paper selects the classic data set Wine Data Set as the research object.The data set contains 178 sample data in three categories.Each row of sample data includes measurements of 13 different components.Constructing data of different dimensions using the Wine Data Set dataset.Then use HC hierarchical clustering,K-means partition clustering,DBSCAN density clustering,EM model clustering to cluster these datasets,increase the dimensions,and compare the clustering results.The robustness of the four clustering methods is explored as the data dimension increases.In order to visually and succinctly display and compare various clustering results,this paper designed a "palette" to compare the clustering results in different situations.Introduce the concepts of similarity,Degree of fragmentation and Ideal degree,and determine the main color of each class under the principle of maximum the Numerical value of Ideal degree.At the same time,using the conditional format in Excel to quickly draw a "palette" based on the clustering results,thereby visualizing various clustering results.This paper first introduces the t-SNE(t-Distribution Stochastic Neighbor Embedding)dimensionality reduction algorithm,and then combines the four clustering methods with the t-SNE dimensionality reduction algorithm to cluster the data of different dimensions and compare the clustering results before and after dimensionality reduction.Explore the feasibility of using t-SNE algorithm in combination with four commonly used clustering algorithms.

Keywords/Search Tags:

common clustering method, robustness, visualization, High dimensional data, t-SNE dimensionality reduction algorithm

PDF Full Text Request

Related items

1	Research On Key Technologies Of Data Preprocessing Method In Near Infrared Spectroscopy Analysis Based On Machine Learning Algorithm
2	Research On LLTSA Method Based On Nonlinear Data Dimensionality Reduction
3	Research And Application Of Multi-dimensional Data Visualization Chart Recommendation Method
4	Steam flooding screening and EOR prediction by using clustering algorithm and data visualization
5	Research On Dimensionality Reduction And Classification Method For Food Safety Data
6	Research On Fault Diagnosis Based On Nonlinear Dimensionality Reduction Method And SOM Algorithm
7	Research On Leakage Detection Of Natural Gas Pipeline Based On Dimensionality Reduction Algorithm
8	Research On Analysis And Visualization Of Atmospheric Pollutant Data In Tianjin
9	High-dimensional Index Information Clustering Algorithm For Tobacco Leaf RAW Materials
10	Research And Application Of Clustering Algorithm For Aluminum Electrolytic Cell Based On DTC