| With the update and iteration of the times,"big data" appears everywhere.Data has become the wealth of this era.The key is how to get more value from the data.As an effective way to obtain value,data mining is favored by all walks of life,covering many applications.As one of the main technologies of data mining,cluster analysis has also attracted much attention in both academic and industrial circles.Its purpose is to divide the data with high similarity into a class or cluster,and the data in different clusters have the lowest similarity as possible.At present,any clustering algorithm cannot be applied to all data types.Clustering ensemble method can make the algorithm meet the requirements of universality by combining a variety of clustering algorithms.Nowadays,scholars at home and abroad have proposed many clustering integration algorithms.But inevitably,due to the huge amount of data,some noise data will be extracted in the data preprocessing stage,and sometimes these data will lead to the clustering results cannot meet the block diagonal matrix,thus affecting the final division results of the algorithm.The work of this paper is based on the clustering ensemble algorithm.Aiming at the problem that the above-mentioned algorithm results cannot meet the block diagonal characteristics,some results have been achieved,as follows:(1)A clustering ensemble algorithm based on base cluster level structured graph learning is proposed.Different base clusters have different effects on the consistency clustering results,and the final consistency association graph is approximated by assigning different weights.A Laplacian matrix is introduced to constrain the association graph so that it satisfies the block diagonal distribution.The effectiveness of this method is verified on a large number of datasets,and the experimental results show that the method is effective.(2)A clustering ensemble algorithm based on cluster-level structured graph learning is proposed.The accurate description of sample similarity is one of the keys of high-quality clustering ensemble.This method first considers the problem of inconsistent reliability of clustering results.By setting the corresponding weight for the clusters in each base cluster and continuously optimizing them,it can accurately express their contribution rate to the clustering results.Secondly,the characteristic of block diagonal matrix is used as a priori to ensure that the similarity matrix is partitioned.Finally,the result is decomposed into a symmetric nonnegative matrix to obtain a low dimensional nonnegative clustering label matrix.Experiments show that this method is superior to other methods on different datasets.(3)An unsupervised clustering ensemble algorithm analysis system is designed and developed.The system includes three modules: data initialization module(dataset selection and parameter setting),unsupervised clustering integration algorithm module and algorithm result display module.The system integrates the clustering ensemble algorithm of clusterlevel structured graph learning and its seven comparison methods.The presentation of the results is based on the display method and evaluation index selected by the operator.To sum up,this paper proposes a new method with the goal of improving the consistency function around the problems in cluster ensemble.One of the reasons for the popularity of cluster ensemble algorithms is that the volatility of the results obtained is less affected by external influences.Therefore,the research in this paper is meaningful both in terms of theory and application. |