Implementation And Application Of Global-Relationship Similarity Measure In Clustering

Posted on:2019-08-05

Degree:Master

Type:Thesis

Country:China

Candidate:Y H Zhang

Full Text:PDF

GTID:2428330566967902

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Clustering analysis is one of the most important research branches of data mining,and it is also one of the most common and potential development directions of data mining.Its main task is to partition data sets according to some similarity measure.At present,the clustering analysis of numerical data has been deeply studied.However,it is not suitable for clustering categorical data in the real world.Therefore,the research and improvement of the clustering data categorical algorithm is an important research direction in the field of clustering analysis.The thesis introduces the concept of cluster analysis and the related data structures,similarity measures,and objective functions.After analyzing the advantages and disadvantages of the K-Modes algorithm,this thesis proposes a global-relationship similarity measure for clustering categorical data,and applies it to clustering categorical data and graph.(1)A new clustering algorithm for categorical data based on global-relationship similarity measure,named as KBGRS,is proposed.Through the analysis of K-Modes algorithm,we find that the simple matching dissimilarity of K-Modes algorithm ignores the relationship among the attributes of data objects,which can influence the clustering accuracy of the algorithm.Based on this,we propose a global-relationship similarity measure,which integrates the relationship between data points and all cluster centers and the differences between different attributes.The K-Modes based global-relationship similarity measure algorithm(KBGRS)uses the K-Modes algorithm framework to complete the clustering.Theoretical analysis shows that the clustering modes and membership degree updating strategy of KBGRS algorithm can make the objective function to be minimized and the algorithm can be converges within a limited number of iterations.Experimental analysis shows that the KBGRS algorithm can effectively clustercategorical data sets.(2)A new clustering algorithm for graph,named as AF-Cluster,is proposed.Through analysis,we find that traditional clustering algorithms only focus on one of the topological structure or vertex features in the graph,and few algorithms combine the two.In order to effectively cluster the undirected graphs containing vertices with categorical attributes,we propose the concepts of direct attraction-force and indirect attraction-force,and defines the structural similarity between vertices in the AF-Cluster algorithm.The AF-Cluster algorithm uses global-relationship similarity as attribute similarity between vertices,and then uses structural similarity and attribute similarity collaborative strategies to define the similarity between vertices.The AF-Cluster algorithm uses the K-Medoids framework for clustering.Theoretical and experimental analysis show that the AF-Cluster algorithm converges and has good clustering results.

Keywords/Search Tags:

Clustering analysis, Similarity measure, Categorical data, Graph

PDF Full Text Request

Related items

1	Categorical Relation Graph Construction And Clustering Analysis For Categorical Data
2	The Research On Clustering Algorithm For Categorical Data Using Quantum Mechanics
3	Studies On Clustering Algorithms For Categorical Data
4	Studies On Clustering Algorithms For Categorical Data
5	Automatic categorical data clustering and spatial data clustering by consecutive resolution refinement
6	Research On Subspace Clustering Algorithm For Categorical Data
7	Similarity Measures And New Clustering Methods For Categorical Sequences
8	A Study On Clustering Algorithms For Categorical Data With Applications
9	Study Of Algorithms For Clustering Categorical Data
10	Multi-view Clustering Based On The Similarity Graph Fusion