Class Equality Cluster Validity Index And Cluster Filter K-Means Algorithm

Posted on:2022-06-18

Degree:Master

Type:Thesis

Country:China

Candidate:N Yu

Full Text:PDF

GTID:2568306323470804

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Cluster analysis is an important research direction in data mining,and it has important applications in the fields of economy,agriculture,medical treatment,and petroleum exploration.Cluster analysis includes two parts:clustering algorithm and cluster validity index.How to effectively evaluate the clustering results is still a challenging task.Cluster validity indexes can be divided into internal cluster validity indexes and external cluster validity indexes.This thesis studies the external cluster validity indexes.The so-called external cluster validity index refers to the use of category label information when evaluating the clustering results.Many different external cluster validity indexes have been proposed.They can be divided into pair-counting,information theory,and set matching.However,there is still a problem with these cluster validity indexes.The cluster size is used when calculating the index,which will cause different cluster sizes to have different effects on the cluster validity index.In response to this problem,this thesis proposes a class equality cluster validity index,which believes that all classes should be equal,regardless of the number of samples.This thesis compares and analyzes 6 cluster validity indexes on 4 artificial data sets and 19 real data sets,and verifies the validity and superiority of the indexes.The K-means algorithm has the advantages of simple implementation,easy understanding,and fast running.It is the most famous and most widely used algorithm among clustering algorithms.However,the K-means algorithm has an initialization sensitive problem.The so-called initialization sensitive problem means that the Kmeans algorithm needs to specify the initial cluster center,and poor initialization will lead to poor clustering results.Many different initialization algorithms have been proposed,but they still need to be run multiple times to determine the optimal clustering results.To solve the above problems,this thesis proposes a cluster filter K-means,which judges whether the cluster is valid by comparing the density of the cluster center and the cluster edge,and re-clusters the invalid clusters.This thesis compares and analyzes 6 benchmark algorithms on 13 public artificial data sets and 19 real data sets,and verifies the effectiveness and superiority of the algorithms.

Keywords/Search Tags:

Cluster Analysis, Cluster Validity Index, K-means, Initialization

PDF Full Text Request

Related items

1	Research Of Improved K-means Algorithm And New Cluster Validity Index In Cluster Analysis
2	Research On New Cluster Validity Index For Overlapping Datasets In Cluster Analysis
3	The Research And Comparative Analysis Of Cluster Validity Index
4	Research On The New Validity Index Of Internal Clustering And The Method To Determine The Optimal Cluster Number
5	Research Of New Clustering Validity Index In Cluster Analysis
6	Research On Connectivity-based Cluster Validity
7	Research Of Fuzzy Clustering Algorithm And Cluster Validity Index
8	Research On Improved K-means Algorithm And New Cluster Validity Index
9	The Research On Fuzzy C-Means Cluster Analysis And Its Applications
10	Research On New Clustering Validity Index Based On Improved Clustering Algorithm