Font Size: a A A

Research Of Data Stream Clustream Algorithms Base On Grid

Posted on:2012-07-23Degree:MasterType:Thesis
Country:ChinaCandidate:L ZhangFull Text:PDF
GTID:2178330332995572Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
With the fast development of information technology and the application of computer network and sensor technology nowadays, a large number of data streams have been produced in telecommunications records, stock trading, network monitoring and viewing WEB page etc. Different from traditional static data, data stream is characteristic of being dynamic, changing fast, reaching continuously, high speed and large scale etc. How to find out useful information from the data stream has become a heated topic in data mining. Cluster analysis is an important method in data mining, and it can find the potential distribution patterns that the users are interested in. However, the traditional clustering algorithms can not be directly applied to the data stream clustering. Therefore, efficient Single-pass scan algorithm needs to be designed, which presents unprecedented challenges to the data stream clustering.Data stream mining knowledge and some related technology were introduced first in this paper. On the basis of analzing traditional clustering algorithms, analyzing and comparing advantages and disadvantages of some representative data stream clustering algorithms from processing speed, clustering quality and so on various aspects.it is found that the grid-based clustering algorithm is rapid and that the density-based clustering algorithm makes it easy to find clusters of arbitrary shape. In view of the f data stream algorithm and analysis combining the characteristics of data stream, this paper mainly completed the following job:1. The characteristic vector of grid was updated online,a exponential decay snapshot algorithem was designed to store the snapshot information ,realizing the densities parameter of grid automatically and the type of grid. 2. Analizing the summary of grid stored online in the offline process, the offline using the data gravity as the center, to rebuild a sub-grid, transforming some parts of dense regions of boundary grids into dense grid to take part in cluster.3. In basis of the above two a new data stream clustering algorithm-DSCAG algorithm based on grid has been designed in this paper. The algorithm is tested in this paper, The result shows that DSCAG algorithm improving the quality of clustering effectively.
Keywords/Search Tags:data stream, clustering, grid, density
PDF Full Text Request
Related items