Study On A Tree-based Incremental Three-way Decision Clustering Algorithm

Posted on:2016-06-11

Degree:Master

Type:Thesis

Country:China

Candidate:C Zhang

Full Text:PDF

GTID:2298330452967713

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Most existing clustering algorithms are static clustering algorithms and they dealwith the static data set, that is to say, the data objects are no longer changing after beingprocessed. But in many practical applications, the data set is dynamic, which means thatthe previous learned patterns or knowledge need to be dynamically updated, so thetraditional static clustering algorithms are no longer suitable to deal with this type ofdata set, because they have to start from scratch to cluster. This will lead to a largenumber of repeated calculation, thus reduces the time efficiency of the algorithms.In addition, most existing clustering algorithms are classical clustering algorithms,these clustering algorithms adopt two-way decisions, namely each data object eitherbelongs to one cluster or not. In other words, these clustering algorithms are hardclustering algorithms, they simply assign each data object into only one cluster.However, in many practical applications, such as the network structure analysis, socialnetwork, biological sciences and so on, each data object may belong to more thanone cluster, classical hard clustering algorithms cannot solve this case.Therefore, the purpose of this study is to investigate the incremental andoverlapping clustering problems in clustering analysis. The contributions of this workare as follows.1. For overlapping clustering problems, this thesis presents a newstatic overlapping clustering algorithm based on representative points. Firstly, thealgorithm finds the representative points in data set. Then it calculates the similar valuesbetween the representative points, then adds some strongly connected edges or weaklyconnected edges between representative points using the three-way decision theoryaccording to the similar values between them. Finally, it finds sub-graphs of therepresentative points graph to complete the clustering of data set.2. For incremental clustering problem, this thesis proposes a new incrementaloverlapping clustering algorithm based on the structure of tree. A searching tree iscreated which is mainly used for storing information related to representative areas ofrepresentative points, the tree helps to reduce the time of searching similar space withthe new data objects. The method of creating searching tree is similar to that of creatingdecision tree, which is built using top-down approach according to the order of attribute importance. The soft incremental clustering algorithm proposed includes three steps. Itfirstly looks for representative points in new data set. Then for each representative point,it finds and updates the searching tree to find its neighbor representative points. Finally,it calculates the similar value between the new representative point and its neighborrepresentative points and uses the idea of three-way decision to cluster the newrepresentative points.The proposed method does not need to define the number of clusters in the data setin advance, it can dynamically determine the number of clusters in the process ofcalculation. The visualization experiments on artificial data set show that the proposedmethod not only can identify clusters of arbitrary shapes, but also can merge smallclusters into the big one when the data changes; the proposed method can detect newclusters which might be the result of splitting or new patterns. In addition, theexperiments on real UCI data sets show that the proposed method has betterperformance especially on F-measure and NMI indices than the compared methods inmost cases.

Keywords/Search Tags:

incremental clustering, overlapping clustering, searching tree, three-way decision theory

PDF Full Text Request

Related items

1	Research On Improvement Of Three-Way Decision Clustering Algorithms And Applications
2	Research On Incremental Clustering For Big Dataset
3	Research On Dynamic Clustering And Incremental In Data Mining
4	Research On Incremental Mining Algorithm And Application For Dynamic Databases
5	Research On Incremental Clustering Algorithm In Data Mining
6	Applications Of Clustering And Decision Tree Methods In CRM System Of Power Supply Enterprise
7	Research On Overlapping Clustering And Attribute Graph Clustering Algorithms
8	Research On Incremental Thesis Homonym Disambiguation Method Based On Pre Training Model And Decision Tree
9	Research And Realization Of Network Consensus Monitor System Based On The Incremental Text Mining
10	Research On Application Of Fuzzy Clustering And Decision Tree Classification Algorithm In Higher Vocational Students' Employment