| Since the twenty-first century,the connection between human and human,the human and the physical world has become more and more close.In this case,the generation of data is everywhere.However,in the data scale is almost explosive growth at the same time,the data quality has not been a corresponding upgrade,can not get enough protection.Because the data in the initial acquisition and exchange and dissemination of the process,there may be a variety of conditions so that we finally get the quality of data problems.However,commonly used clustering algorithms usually require high quality data to be used normally,but when the quality of large data problems,such methods are usually poor performance.It is often necessary to use the data cleaning technology to the quality of the data before the first cleaning,and then such as clustering data mining operations.But data cleaning on large-scale data often has a very expensive time overhead,and the final cleaning effect may not be as good as people wish;that we spend a lot of time on the data cleaning,the final data may still be unable to clear the quality Problem,that is to say,the final cleaning result does not significantly improve the quality of the data mining results.Therefore,the study of clustering operations directly on weakly available data provides a new way to solve this problem,that is,we do not clean up the data directly for clustering operations or perform clustering operations without clean data.This article focuses on how to perform clustering analysis on an incomplete set of data.First,this paper analyzes the spatial structure of incomplete data,thus understanding the impact of incomplete data on clustering operations.In this paper,an incomplete clustering algorithm based on fuzzy clustering is designed.The incomplete data clustering algorithm based on fuzzy clustering regards the missing in the data as the optimization variable in the clustering iterative process and is updated continuously in the iterative process Solve the completion of incomplete data clustering.Based on the incomplete data clustering algorithm,the two core requirements in the clustering process are described.The cluster center in the cluster must be the point where the density of the surrounding points is large,and the points with other points The distance between as far as possible,after determining the cluster center and then according to a certain strategy to other points into the current cluster to go.The incomplete data clustering algorithm based on information theory regards the clustering process as a process of changing the uncertainty of the cluster.With the addition of attributes,the uncertainty of a record category is reduced,and finally we can Which is divided into the cluster with the least uncertainty.For the incomplete data,we need to estimate the basic parameters of the information theory and the information parameters of the cluster.Through the combination of the two,we can complete the clustering operation of incomplete data.At the end of each algorithm design,this paper carries on the experimental analysis to the algorithm through the related experiment. |