K-means Algorithm Based On Projection Of Clustering Analysis

Posted on:2013-05-24

Degree:Master

Type:Thesis

Country:China

Candidate:Y X Li

Full Text:PDF

GTID:2248330395473517

Subject:Computational Mathematics

Abstract/Summary:

PDF Full Text Request

Data mining is widely studied in the field of machine learning. It integrate Artificial intelligence and database technology closely so that computers can help people obtain valuable knowledge model and laws from the large amount of data in the database intelligently and automatically in order to meet the needs of people in different applications. K-means algorithm, as the most widely used algorithm of cluster analysis, has the advantages of scalability and high efficiency. But k-means algorithm is dependent on the k value, the choice of initial cluster centers, as well as the selection of under the conditions of different sample object. Also Isolated point is very sensitive to the algorithm, a few remote isolated point will be a great impact. For a large number of high-dimensional data in terms of computing efficiency, the biggest obstacle lies in the Euclidean distance calculation. This article is mainly on efficiency improvements of the calculation and the separation of isolated points.This paper gives corresponding improvement algorithms to eliminate the k-means algorithm inadequacies. Including the improved method for the selection of the initial cluster centers to improve the initial selection of cluster centers, cluster and mean value separation method to reduce the interference of isolated points, and improved method based on kernel function. Efficiency improvements method based on the projection use the projection and dimension reduction theory to improve the k-means algorithm. Calculate all the projection distance of vector to be clustered in the selected direction, and build a vector index of the projection distance. Do a judgment of the first projection distance at each iteration when re-distribute points in order to exclude the points whose projection distance are too far away from cluster centreâ€™s projection distance. the points whose projection distance are too far away from all the cluster centreâ€™s projection distance will be set to be isolated points. This will not only improve the efficiency of the algorithm and reduce the isolated points for the algorithm. The test results also show good results.

Keywords/Search Tags:

data mining, k-means, projection pursuit, isolated points

PDF Full Text Request

Related items

1	The Clustering And The Isolated Points' Detection Based On The Protein-protein Interaction Network
2	Improve The Application Of K-means Algorithm In Text Clustering
3	Studies On Projection Pursuit Method Of Polarimetric SAR Image Classification
4	Improved Particle Swarm Optimization Projection Pursuit Clustering
5	Research And Application Of Projection Pursuit Model
6	Research On Network Anomaly Detection Based On Projection Pursuit Regression
7	Improved K-means Clustering Based On Genetic Algorithm
8	The Research Of Trustworthy Software Estimation Model Based On Improved Projection Pursuit Technique
9	The Research About Partition-based And Density-based Clustering Algorithm
10	Research On Wormhole Detection Mechanism In Ad Hoc Based On Projection Pursuit