Data Stream Mining Algorithm

Posted on:2009-07-28

Degree:Master

Type:Thesis

Country:China

Candidate:X Z He

Full Text:PDF

GTID:2208360245461021

Subject:Operational Research and Cybernetics

Abstract/Summary:

PDF Full Text Request

In recent years, many applications generate a large number of streaming data due to the development of data gathering technology. Analyzing and mining such kind of data has increasingly become hot issues. Compared with traditional static databases, data streams have the following characteristics: (1) potentially unbounded volume of data; (2) rapid arriving rate of data; (3) inadmissibility of scanning historical data repeatedly.The characteristics of data streams require that mining data streams must meet the following basic requirements: First, the algorithms should process rapid arriving data. Therefore, the computational complexity of the algorithm should be low; Furthermore, limited memory cannot store unbounded volume of data. Therefore, the space complexity of the algorithms should be low and the algorithms should keep a basic approximate space where it can get approximate solutions; In addition, data streams are time changing so that the parameters of the algorithms should be dynamically adjusted to such changes.Traditional data mining algorithms can hardly meet the above requirements at the same time, and thus the algorithms should be improved or designed for those new data.In recent years, the data streams mining researches have made great progresses. However, these new methods still have a lot of limitations and they can handle limit kinds of data streams.The main contributions of this thesis include the following aspects:1. This article presents an algorithm for visualizing high-dimensional mixed type of data streams. The algorithm adjusts its parameters to the incoming data and maps numerical data and categorical data into color space with different methods, assuring that different data can be distinguished from each other, and thus a recent color matrix will be available and finally we will get a viewgraph of recent data.2. This article put forwards the algorithm of HSFC which is short for High-dimensional Stream Fading Core clustering. First, the concept of cluster core is defined, based on which the incoming data's class label is judged by the method of "targeting". In this algorithm, all the parameters and data structures are fading with time and they are to be readjusted with the corresponding fading factors. The empirical results show that HSFC can cope with changes of the data streams and get fine clustering results.

Keywords/Search Tags:

data stream, data mining, visualizing, clustering

PDF Full Text Request

Related items

1	Study On Key Technologies Of Frequent Items Mining And Clustering On Data Streams
2	A Density-Based Clustering Algorithm Over Stream Data
3	Research On Dynamic Measurement Based Data Stream Clustering And Its Applications
4	Adaptive Evolving Data Stream Algorithm Based On Time Decay Window
5	Research On An Application Of Data Stream Query And Data Stream Mining In Oil Field
6	Research On Data Stream Clustering And Its Applications Based On Correlations
7	Study On Data Stream Techniques And Its Application In Electric Power Information Processing
8	Research Of Evolving Data Stream Clustering
9	Study On Clustering Algorithm Adapt To High-speed Data Stream
10	Analysis Of The Clustering Algorithm On Data Stream Using Resilient Distributed Datasets