| The intensification of the Internet has brought human society into the era of big data.All kinds of data are now available for us,therefore data processing is particularly important.Cluster analysis is a commonly used method of data processing and the main purpose of clustering is to classify data,so that the distance between objects in the same cluster is as small as possible,while the distance between objects in different clusters is as large as possible.Thus,the data clustering can help us to deeply understand the distribution of data and their better processing.At present,the clustering of numerical data has attained productive results,but in the real world,there are a large number of classification data.Due to the lack of geometric attributes of classification data,it could not directly carry out geometric calculation like numerical data,so it is particularly important to study its algorithm.In recent years,many scholars have been studied the classification data with a single attribute,but there are relatively few studies have been carried out on classification data with multiple attributes.In this paper,how to deal with the classification matrix object data with multiple attributes was studied mainly from two aspects,hard partition and soft partition.The main research findings were as follows.This paper studies how to deal with the classification matrix object data with multiple attributes of classified data,mainly from the hardening and softening of the two aspects of the study,the main research results of this paper:(1)From the hard division level: the classification matrix object data containing multiple feature vectors was described,and a new clustering algorithm for classification matrix object data based on inter-cluster information(between-cluster k-modes,BC-k-modes)was proposed.Using the clustering process of k-modes algorithm,the algorithm clusters the classification matrix object data derive the updating formula of the membership matrix and the clustering prototype,and seek out the local optimal solution of the objective function by increasing the information between clusters.Finally,experiments were carried out on five real data sets,and the results showed that the clustering effect of the algorithm was better than other algorithms.(2)From the soft partition level,a new fuzzy between-cluster k-modes(Fuzzy BC-k-modes)clustering algorithm was proposed for classification matrix objects.In the algorithm,the objective function was modified by adding the information between clusters(the similarity within clusters was as small as possible,and the similarity between clusters was as large as possible)and introducing the fuzzy factor α(the object had membership degree to all classes,which was not a simple 0-1 matching).When seeking the local optimal solution to the modified objective function,the updating formula of membership matrix was proposed through formula derivation.Finally,the validity of fuzzy BC-k-modes algorithm was verified by five real data sets and a simulation data set,and the relationship between fuzzy factor and membership degree was analyzed.The two algorithms proposed in this study further enrich the research on the classification data clustering algorithm of the classification matrix object data,and also provided new method to support the databases in many fields such as telecommunications,insurance,banking,and healthcare. |