Research On Data Stream Clustering Based On Density And Grid

Posted on:2013-02-24

Degree:Master

Type:Thesis

Country:China

Candidate:H Y Xu

Full Text:PDF

GTID:2248330392451327

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

In recent yearsï¼Œbecause of the rapid development of computer and information technology, peopleâ€™s ability of obtaining data improves greatly. DataStream is a type of important data source, and is subjected to more and more concern.Stream data is a kind of continuous, changing fast, ordered and huge amount data. It is quite a new object that is different from traditional static data stored on the disk. Currently, data mining on data stream becomes a hottest field. Clustering data stream is one of the hottest research points on it.One target on this thesis is to design and develop a data stream clustering algorithm which is accuracy and high-speed. In order to reach this, we have done some work as follows: The related research background and meaning is discussed. The advantages, disadvantages and applicability of several type of popular clustering algorithms are summarized. The characteristics of data stream and key technical points on data stream clustering are researched. On the basis of these, we proposed a data stream clustering algorithm TD-Stream which based on density and grid. The algorithm borrowing the framework from CluStream algorithm, TD-Stream is divided into online layer and offline layer, The two layers work together to achieve the balance of accuracy and speed. Online layer reads data stream rapidly, and stores relative information by synopsis data structure. Through the introduction of the "trend degree", the method of computing grid density in the traditional density-grid based clustering algorithm was improved, new data reading algorithm compute the trend degree of the new data, and with this, it map the new data to correct grid, which can solve the problem of one grid belongs to more classes and the loss of information on the edge of grid result from based on the absolute grid. With the synopsis data structure which stored in online, offline layer provide accurate clustering. Density-based clustering algorithm is used, so that the system is sensitive to the datasets of arbitrary shape. The system can also satisfy the need of clustering and evolution history data stream with the concept of grid frame and evolution difference. Therefore, not only the high efficiency of the grid-based algorithm was utilized, but also the clustering accuracy was raised significantly. At last, we did some experiments based on both synthetic datasets and real datasets on the TD-Stream algorithm proposed in this paper, and the experiments results show that the algorithm is accuracy and high efficiency and can cluster data stream efficiently.

Keywords/Search Tags:

Clustering, Data Stream, Grid, Density, Trend Degree

PDF Full Text Request

Related items

1	Research On Data Stream Clustering Algorithm Based On Density Grid
2	Research On Data Stream Clustering Algorithm Based On Density Grid Over Sliding Window
3	Data Stream Clustering Algorithm Based On Active Grid-density
4	Research On Data Stream Clustering Based On Grid And Density
5	Research On Clustering Method Of Datastream Based On Grid And Density
6	The Research Of Grid-based Parallel Clustering Algorithm And Clustering For Data Stream
7	Research On Grid And Density Based Data Stream Clustering Algorithm
8	Research And Improvement On Stream Data Clustering Algorithm
9	Research On Data Stram Clustering Algorithm Based On Similarity And Grid Partition Optimization
10	Research On Dynamic Measurement Based Data Stream Clustering And Its Applications