Research On Data Stream And Its Mixed Attributes Clustering Algorithm

Posted on:2013-04-19

Degree:Master

Type:Thesis

Country:China

Candidate:X Q Shen

Full Text:PDF

GTID:2248330395473340

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Clustering is a kind of very important method in data mining, but the traditional clustering methods can only deal with static data. With the development of technology, many areas, such as the Internet, data transmission, the callâ€™s data, websiteâ€™s log files, and so on, produce lots of data all the time. The characteristics of these data are massive, unbounded and its speend is indefinite. Therefore, a well designed clustering method in the data stream environment is very valuable.Currently, more and more mixed attribute stream data is producing, but the clustering research about them is relatively less, and there also havenâ€™t found any specific description of the mixed attribute data stream clusteringâ€™s process, and there is also a lot of room for mixed attribute data stream to improve on standardization of data and clustering data with better algorithms.According to the above problems, main improvements of this work are:1. Introduces the basic concepts of data stream mining, methods of similarity measurement, and the basis clustering algorithms of data stream; then summarizes the related technologies of processing data streams, finally provide a basis for below.2. According to the characteristics of data stream and the variation of dataâ€™s distribution in data stream, the three-tier clustering framework based on micro-cluster optimization is proposed with describing the meaning of the three-tier framework specifically; After analyzing and summarizing the traditional k-nearest neighbor, a optimal2k-nearest neighbors clustering algorithm is proposed. The Algorithm can adjust the micro-clusterâ€™s radius adaptively by analyzing the distribution of data in2k-nearest neighbors.3. Due to the reason that there havenâ€™t found specifying clustering process for mixed attribute data stream, a kind of three steps of clustering thought is proposed by studying. Then, according to the traditional k-nearest neighborâ€™s method being not suited to the mixed-attribute data stream, double k-nearest neighbors concept is proposed, and do the online micro-clustering with improved dimension distance.4. A kind of cosine model based on mean value is proposed by improving the cosine formula method used by traditional algorithm to judge the similarity between different data objects in mixed attribute data stream, and experiment demonstrate that the method improves clusteringâ€™s results both in actual dataset and artificial dataset.

Keywords/Search Tags:

data stream, mixed attributes, framework, micro-cluster, clustering

PDF Full Text Request

Related items

1	Research On Clustering Algorithms For The Data With Multidimensional Mixed Attributes
2	Research On Data And Data Stream Clustering Algorithms For Mixed Attributes
3	Research On Partitioning Clustering Algorithms For Data With Mixed Numerical And Categorical Attributes
4	Research On Frequent Items Mining And Clustering Algorithms Of Data Stream
5	Research On Clustering Algorithm For Mixed Attributes And Application
6	Research On Clustering Ensemble Of Mixed Data And Clustering Algorithm Of Mixed Data Streams
7	Research And Application Of Rough Clustering Methods Of Mixed Attribute Data With Self-adaptive Cluster Adjustment
8	Research On Uncertain Data Streams Clustering Algorithm Based On Tuple Cluster Feature
9	A Study Of The Clustering Algorithm For Mixed Data
10	Research And Application On Mixed Data Clustering Algorithm Based On Intra-Cluster And Inter-Cluster Information