| The data cluster is an important branch in data mining.At present the data cluster algorithms we have are majority confined to deal with the data which only have the continual attributes;moreover a few algorithms are confined to deal with the data which only have the nominal attributes.If we only deal with one kind of attribute, data message must lose in conditions of mix attributes, effect to the quality of data mining. How to carry on the cluster of mix attributes is still a challenging domain at present.The main research work in this article include following several aspects:1.First introduced the K-prototypes algorithm, and then proposed 2 kinds improvement means aim at the K-prototypes, the first kind is a new algorithm named CVAD(Categorical Value Attributes Decompose), which based on K-prototypes algorithm and the fuzzy K-prototypes algorithm, this method can overcome the insufficiency of original method, and can work out better cluster result. The second kind is a kind of improvement algorithm based on chooses initial points by group, which based on K-prototypes algorithm,and made the further improvement to choose groups through the genetic algorithm.2.Proposed a kind of mix attributes cluster algorithm based on the BIRCH algorithm; the algorithm proposed in the article has good performance that indicated in the UCI data set experiment.3.Proposed a kind of mix attributes cluster algorithm based on the improvement DBSCAN algorithm,gave the correlation description, and pointed out the merit of this algorithm.4.Proposed a kind of mix attributes cluster algorithm based on the cluster fusion (Cluster Ensemble-based Mixed Attribute Cluster, CEMC),introduce the cluster fusion method system into the mix attributes data cluster problem, promoted the cluster fusion method, carried on the mix attributes data cluster problem with the cluster fusion theory, established the algorithm frame, proposed the objective function and the algorithm,tested the algorithm validity in the actual data. |