| Clustering is a kind of very important method in data mining, but the traditional clustering methods can only deal with static data. With the development of technology, many areas, such as the Internet, data transmission, the call’s data, website’s log files, and so on, produce lots of data all the time. The characteristics of these data are massive, unbounded and its speend is indefinite. Therefore, a well designed clustering method in the data stream environment is very valuable.Currently, more and more mixed attribute stream data is producing, but the clustering research about them is relatively less, and there also haven’t found any specific description of the mixed attribute data stream clustering’s process, and there is also a lot of room for mixed attribute data stream to improve on standardization of data and clustering data with better algorithms.According to the above problems, main improvements of this work are:1. Introduces the basic concepts of data stream mining, methods of similarity measurement, and the basis clustering algorithms of data stream; then summarizes the related technologies of processing data streams, finally provide a basis for below.2. According to the characteristics of data stream and the variation of data’s distribution in data stream, the three-tier clustering framework based on micro-cluster optimization is proposed with describing the meaning of the three-tier framework specifically; After analyzing and summarizing the traditional k-nearest neighbor, a optimal2k-nearest neighbors clustering algorithm is proposed. The Algorithm can adjust the micro-cluster’s radius adaptively by analyzing the distribution of data in2k-nearest neighbors.3. Due to the reason that there haven’t found specifying clustering process for mixed attribute data stream, a kind of three steps of clustering thought is proposed by studying. Then, according to the traditional k-nearest neighbor’s method being not suited to the mixed-attribute data stream, double k-nearest neighbors concept is proposed, and do the online micro-clustering with improved dimension distance.4. A kind of cosine model based on mean value is proposed by improving the cosine formula method used by traditional algorithm to judge the similarity between different data objects in mixed attribute data stream, and experiment demonstrate that the method improves clustering’s results both in actual dataset and artificial dataset. |