| Data stream clustering is an important research field in the data stream mining. There still exist many problems for clustering data streams in the algorithms at home and abroad. For example, the inherent sparsity in high-dimensional data is not solved well, the clustering algorithm is inefficiency, data type is limited to numerical data, these needs of users cannot be meet and so on. To address these problems, the paper has mainly focused on how to cluster data streams based on subspace. This research has important meaning for e-commerce, Network Communication, Business Intelligence and so on.Firstly, to solve the problems that clustering efficiency and accuracy are affected greatly by the high volatility of the data stream flow rate and the current resource-constraint clustering environment as well as the sparseness of high dimensional data streams, we propose a new high dimensional subspace-based adaptive algorithm, called SAStream. We improve the cluster structure in HPStream and define the candidate clusters. We only compute the distance between the newly coming data points and the centroids of the candidate clusters instead of all clusters, so the number of examined clusters is reduced during clustering process. The created clusters are stored in Pyramidal time frame and time fading function is used to discount the history of past behavior. When the data rate is fast, the LimitingRadius and cluster selection factor adjust automatically, and the clustering granularity adjust all along.Secondly, to cluster high dimensional categorical data streams, we propose a new algorithm called SUBCStream. The compressed storage structures of the clusters are redefined in the paper. The symbol matrix and frequency matrix are used to store data. We can find the clusters and maximal relevant subspaces by minimizing the objective function. The additivity property of cluster structure is used to merge cluster structure or add new data points. In order to discount the history of past behavior and reduce the maintenance cost, we add fading functions for every cluster.Finally, SAStream and SUBCStream algorithms are implemented with language of Java. All of our experiments are performed on the real and synthetic datasets. The experimental results show the feasibility and effectiveness of our algorithms. |