| The unsupervised learning method of categorical data plays a more and more important role in such areas as pattern recognition,machine learning,data mining and knowledge discovery in the recent years.To effectively discover the group structure inherent in a set of categorical objects,many categorical clustering algorithms have been developed in the literature.Nevertheless,in comparison with clustering algorithms performances for numeric data,there is still a large room for improvement,which may arise from the fact that categorical data lack a clear space structure as that of numeric data.Tracking domestic and international research frontiers,this work devotes to deep exploration and experiment in view of clustering analysis for categorical data as follows:(1)Make a more comprehensive and meticulous research of the current international and domestic popular classic clustering algorithms for categorical data.Analyze and list the advantages and disadvantages of various methods comparatively.Lead to the motivation and objective of the novel algorithm.In this thesis,a novel data representation scheme for categorical data was employed to map a set of categorical objects into a Euclidean space of new dimensions,without any loss of information.Based on the new general framework for clustering of categorical data,we here employ the Carreira-Perpi~n’an’s K-Modes algorithm to find the more representative modes(SBC_K-modes algorithm).Comparisons with four classical clustering algorithms for categorical data illustrate the effectiveness of the new methods on nine categorical data sets downloaded from UCI.(2)Under the further assumption and research of the feasible space structure of categorical data,we proposed a new data representation scheme.The validity and the effectiveness were proved by illation and experiment.Based on the new general framework for clustering of categorical data,we here adopted the K-Means paradigm and two different dissimilarity measures methods and naturally presented two kinds of algorithms(NSBC and JSBC).Comparisons with four classical clustering algorithms for categorical data illustrate the effectiveness of the new methods on nine categorical data sets downloaded from UCI.In a word,the theory of clustering algorithms for categorical data are studied and the novel data representation scheme and relevant algorithm are proposed in the thesis.Experimental results show the effectiveness of the algorithms on data sets from UCI repository.The research results in the thesis will provide new methods and ideas for cluster analysis of categorical data,and have application value in some domains such as data mining and knowledge discovery. |