| Cluster analysis is a very important step in data mining,which partition messy data into some meaningful categories according to certain rules.These aggregated categories provide significant support for initial information mining and subsequent critical information discovery,so they have very important research significance.In the real world,there are not only numerical data such as height and weight that can be quantified numerically,but also categorical data such as rating,color and occupation that cannot be quantified numerically.In this case,it is particularly important to study the clustering algorithm for categorical data.However,due to the complexity and unquantification attributes of categorical data,research on clustering algorithm is faced with many difficulties.The research findings are far less than numerical data clustering algorithm,and so there is a huge room for exploration and improvement.In this thesis,based on the existing categorical data clustering algorithms,several improved clustering algorithms are proposed which are better than the basic algorithm.The main work of this thesis is as follows:(1)Considering the clustering contribution degree of different attributes in the categorical data set,we propose a maximum entropy regularized weighted fuzzy K-modes algorithm.In this algorithm,feature weight entropy is added to the objective function of the fuzzy K-modes algorithm.while minimizing the dispersion in the cluster,the attribute weight entropy is maximized to emphasize the important attribute features,so that the optimal clustering result is finally obtained.Experimental results on five data sets of UCI database show that compared with fuzzy K-modes,the proposed EWFKM algorithm improves the clustering accuracy,precision and recall rate.(2)An iterative intuitionistic fuzzy K-modes algorithm is proposed.We find that the intuitionistic fuzzy K-modes(IFKM)algorithm used a simple 0-1 matching similarity measure in the categorical data clustering process.It determined the category of categorical data objects directly according to intuitionistic fuzzy membership matrix in each iteration,which cannot give full play to the advantages of intuitionistic fuzzy theory.To overcome these shortcomings of IFKM algorithm,we propose an iterative IFKM algorithm.Firstly,we propose a weighted similarity measure of intuitionistic fuzzy membership,which is defined based on intuitionistic fuzzy set.Secondly,the intuitionistic fuzzy membership matrix is used as iterative information throughout the whole clustering process,so that the intuitionistic fuzzy idea is fully utilized in the algorithm.Experimental results show that the performance of the proposed algorithm is better than that of intuitive fuzzy K-modes algorithm.(3)To solve the problem that the iterative intuitionistic fuzzy K-modes algorithm is greatly affected by the initial centroids,we propose an iterative intuitionistic fuzzy K-modes algorithm based on cuckoo search.In this algorithm,firstly,the cuckoo search algorithm is used to find K high-quality initial categorical data centroids,and then the iterative intuitionistic fuzzy K-modes algorithm is used for one-step clustering,and finally the categorical data clustering results are obtained.Experimental results show that based on the selection of high quality initial class center,the clustering performance of this algorithm is significantly improved,and the clustering performance is excellent,which is much better than the iterative intuitive fuzzy K-modes algorithm. |