CLIQUE Algorithm And Parallelize

Posted on:2004-04-30

Degree:Master

Type:Thesis

Country:China

Candidate:F Zu

Full Text:PDF

GTID:2168360095456634

Subject:Computer system architecture

Abstract/Summary:

PDF Full Text Request

Data mining technology is used to help people finding the information and knowledge in the data. It has become the core technology of the intelligence commerce. It has been widely used in many areas and drawn the attention of the whole academe. Clustering is one of the most important areas in data mining Clustering finds the similarity among the data and use it to optimal the query of the large scale databases and find the hidden useful information and knowledge. How to make the clustering faster and the result of the clustering more accurate is of the most importance and hardness.CLIQUE is integrated density-based and grid-based method. It has the advantage of faster speed. But due to simplify the procedure, the accuracy of the clustering may be degraded. After deeply investigate and analysis, we found the drawback of CLIQUE lies in its inconsideration of the characteristic of the data being processed. It grid the data into a predefined grid and this adds up to the complexity of the computation. Then it has to degrade the accuracy of the result to degrade the complexity of computation,. We introduce adaptive-grid method to settle this problem. We divide each dimension into a fix interval and join the dense interval to dense part. At the boundary of the each dense part, boundary is adjusted by dividing a smaller interval. Finally the adaptive-grid is produced according to the dense part. This method makes full use of the characteristic of the data being processed. The number of dense unit and candidate dense unit is great reduced. At the same time the complexity of the computation is greatly decreased. So, computation in each dimension is feasible. This make the accuracy of cluster upgraded. But the computation complexity of the algorithm is still exponential. Due to the fact the exponent is dimension, the complexity of algorithm is still less than other clustering algorithms.To make the algorithms more efficient, it was parallelized. The hardware platform is PC connected with LAN. The software platform is PVM and LINUX. They construct the whole PC-cluster system. The parallel program model is master/slave model. The algorithm assign data set to each node realizes the data-parallel. When produce dense unit, task-parallel is used. Due to the fact the algorithm is complete data-parallel; the speedup of the algorithm is nearly liner. The time complexity of the each node is composes of exponential computation time and liner communication time. At last, the experiment proves the feasibility of the algorithm and the speedup gets from the experiment is in accord with theoretical one. The experiment also proves the parallel algorithm upgrade the accuracy of the clustering result combined with more efficient. Because the algorithm is based on PVM cluster, it is more popular.

Keywords/Search Tags:

Data mining, clustering, parallel algorithm, NOW

PDF Full Text Request

Related items

1	Parallel Data Mining Theory Research And Application
2	Application And Research Of Parallel Genetic Algorithm In Data Mining Of K-Medians
3	Research On Parallel Optimization Of Clustering Algorithms In Data Mining
4	The Encrypted Block Cipher Algorithm Based On Gpu Parallel Clustering Research And Implementation
5	Research On K-MEANS Algorithm Based On GPU Parallel And Its Application In Text Clustering
6	Parallel Research Based On Improved AP Clustering Algorithm And Application In Web Log Mining
7	Research On Clustering Algorithm Based On Data Mining And Its Application
8	Research On Visualization Techniques And Application For Data Mining Based-on K-means
9	Research On Parallel Data Mining Algorithm Based On Hadoop
10	Research And Design Of Parallel K-prototypes Clustering Algorithm Based On Hadoop