Font Size: a A A

Research On Clustering Method Based On Data Field

Posted on:2010-09-06Degree:MasterType:Thesis
Country:ChinaCandidate:F GuoFull Text:PDF
GTID:2178360272479341Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the advent of the information age, production and collection of mass data leads to Information Explosion, and Data Mining has become hot research spot in Computer Science area. As an importance task and method for Data Mining, clustering analysis has a great impact on algorithmic efficiency and clustering quality, which is one of difficult problems in Computer Science area.As an important branch of clustering analysis, clustering algorithm based on density has a principal position because it is able to discover clustering of arbitrary shape and it can deal with noise data effectively. DBSCAN is a classic density-based method, and it has not only advantages of general density-based method, but also high speed. However, it has many disadvantages. For instance, cluster parameters are hard to choose; clustering quality is low when partition densities are not equal; the random choice of initial clustering object wastes time; field searching to all seed objects costs computer memory and time.To resolve disadvantages of DBSCAN, considering that data in data space is not independent but has influence with each other, the author combines cluster with the theory on data field to improve DBSCAN. The author put forward a new density-based clustering method based on Data Field (DFDBSCAN).The algorithm puts the interaction between material particles and the field methods into abstract data space, and improves DBSCAN algorithm for its inadequacies by using the relationship between data field power in the data space and data density distribution.The algorithm adopts dynamic strategy to calculate the clustering radius, and solves the problem of data misdistribution. At the same time, algorithm utilizes relationship between field potential and the density of data distribution to improve the choice of initial clustering object and seed objects. Therefore, same as time complexity of DBSCAN, DFDBSCAN ascends Clustering quality as well as Clustering efficiency. Thus, the algorithm efficiency has been improved to some extent and the algorithm does not only save time but also the memory resources. The algorithm based on mathematics and have theoretical basis. At the same time, the algorithm is verified by the experimental data.
Keywords/Search Tags:clustering analysis, data field, DFDBSCAN, clustering quality
PDF Full Text Request
Related items