Font Size: a A A

K-means Algorithm Based On Projection Of Clustering Analysis

Posted on:2013-05-24Degree:MasterType:Thesis
Country:ChinaCandidate:Y X LiFull Text:PDF
GTID:2248330395473517Subject:Computational Mathematics
Abstract/Summary:PDF Full Text Request
Data mining is widely studied in the field of machine learning. It integrate Artificial intelligence and database technology closely so that computers can help people obtain valuable knowledge model and laws from the large amount of data in the database intelligently and automatically in order to meet the needs of people in different applications. K-means algorithm, as the most widely used algorithm of cluster analysis, has the advantages of scalability and high efficiency. But k-means algorithm is dependent on the k value, the choice of initial cluster centers, as well as the selection of under the conditions of different sample object. Also Isolated point is very sensitive to the algorithm, a few remote isolated point will be a great impact. For a large number of high-dimensional data in terms of computing efficiency, the biggest obstacle lies in the Euclidean distance calculation. This article is mainly on efficiency improvements of the calculation and the separation of isolated points.This paper gives corresponding improvement algorithms to eliminate the k-means algorithm inadequacies. Including the improved method for the selection of the initial cluster centers to improve the initial selection of cluster centers, cluster and mean value separation method to reduce the interference of isolated points, and improved method based on kernel function. Efficiency improvements method based on the projection use the projection and dimension reduction theory to improve the k-means algorithm. Calculate all the projection distance of vector to be clustered in the selected direction, and build a vector index of the projection distance. Do a judgment of the first projection distance at each iteration when re-distribute points in order to exclude the points whose projection distance are too far away from cluster centre’s projection distance. the points whose projection distance are too far away from all the cluster centre’s projection distance will be set to be isolated points. This will not only improve the efficiency of the algorithm and reduce the isolated points for the algorithm. The test results also show good results.
Keywords/Search Tags:data mining, k-means, projection pursuit, isolated points
PDF Full Text Request
Related items