| In recent years, many applications generate a large number of streaming data due to the development of data gathering technology. Analyzing and mining such kind of data has increasingly become hot issues. Compared with traditional static databases, data streams have the following characteristics: (1) potentially unbounded volume of data; (2) rapid arriving rate of data; (3) inadmissibility of scanning historical data repeatedly.The characteristics of data streams require that mining data streams must meet the following basic requirements: First, the algorithms should process rapid arriving data. Therefore, the computational complexity of the algorithm should be low; Furthermore, limited memory cannot store unbounded volume of data. Therefore, the space complexity of the algorithms should be low and the algorithms should keep a basic approximate space where it can get approximate solutions; In addition, data streams are time changing so that the parameters of the algorithms should be dynamically adjusted to such changes.Traditional data mining algorithms can hardly meet the above requirements at the same time, and thus the algorithms should be improved or designed for those new data.In recent years, the data streams mining researches have made great progresses. However, these new methods still have a lot of limitations and they can handle limit kinds of data streams.The main contributions of this thesis include the following aspects:1. This article presents an algorithm for visualizing high-dimensional mixed type of data streams. The algorithm adjusts its parameters to the incoming data and maps numerical data and categorical data into color space with different methods, assuring that different data can be distinguished from each other, and thus a recent color matrix will be available and finally we will get a viewgraph of recent data.2. This article put forwards the algorithm of HSFC which is short for High-dimensional Stream Fading Core clustering. First, the concept of cluster core is defined, based on which the incoming data's class label is judged by the method of "targeting". In this algorithm, all the parameters and data structures are fading with time and they are to be readjusted with the corresponding fading factors. The empirical results show that HSFC can cope with changes of the data streams and get fine clustering results. |