Font Size: a A A

Implementation And Application Of Global-Relationship Similarity Measure In Clustering

Posted on:2019-08-05Degree:MasterType:Thesis
Country:ChinaCandidate:Y H ZhangFull Text:PDF
GTID:2428330566967902Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Clustering analysis is one of the most important research branches of data mining,and it is also one of the most common and potential development directions of data mining.Its main task is to partition data sets according to some similarity measure.At present,the clustering analysis of numerical data has been deeply studied.However,it is not suitable for clustering categorical data in the real world.Therefore,the research and improvement of the clustering data categorical algorithm is an important research direction in the field of clustering analysis.The thesis introduces the concept of cluster analysis and the related data structures,similarity measures,and objective functions.After analyzing the advantages and disadvantages of the K-Modes algorithm,this thesis proposes a global-relationship similarity measure for clustering categorical data,and applies it to clustering categorical data and graph.(1)A new clustering algorithm for categorical data based on global-relationship similarity measure,named as KBGRS,is proposed.Through the analysis of K-Modes algorithm,we find that the simple matching dissimilarity of K-Modes algorithm ignores the relationship among the attributes of data objects,which can influence the clustering accuracy of the algorithm.Based on this,we propose a global-relationship similarity measure,which integrates the relationship between data points and all cluster centers and the differences between different attributes.The K-Modes based global-relationship similarity measure algorithm(KBGRS)uses the K-Modes algorithm framework to complete the clustering.Theoretical analysis shows that the clustering modes and membership degree updating strategy of KBGRS algorithm can make the objective function to be minimized and the algorithm can be converges within a limited number of iterations.Experimental analysis shows that the KBGRS algorithm can effectively clustercategorical data sets.(2)A new clustering algorithm for graph,named as AF-Cluster,is proposed.Through analysis,we find that traditional clustering algorithms only focus on one of the topological structure or vertex features in the graph,and few algorithms combine the two.In order to effectively cluster the undirected graphs containing vertices with categorical attributes,we propose the concepts of direct attraction-force and indirect attraction-force,and defines the structural similarity between vertices in the AF-Cluster algorithm.The AF-Cluster algorithm uses global-relationship similarity as attribute similarity between vertices,and then uses structural similarity and attribute similarity collaborative strategies to define the similarity between vertices.The AF-Cluster algorithm uses the K-Medoids framework for clustering.Theoretical and experimental analysis show that the AF-Cluster algorithm converges and has good clustering results.
Keywords/Search Tags:Clustering analysis, Similarity measure, Categorical data, Graph
PDF Full Text Request
Related items