| Most existing clustering algorithms are static clustering algorithms and they dealwith the static data set, that is to say, the data objects are no longer changing after beingprocessed. But in many practical applications, the data set is dynamic, which means thatthe previous learned patterns or knowledge need to be dynamically updated, so thetraditional static clustering algorithms are no longer suitable to deal with this type ofdata set, because they have to start from scratch to cluster. This will lead to a largenumber of repeated calculation, thus reduces the time efficiency of the algorithms.In addition, most existing clustering algorithms are classical clustering algorithms,these clustering algorithms adopt two-way decisions, namely each data object eitherbelongs to one cluster or not. In other words, these clustering algorithms are hardclustering algorithms, they simply assign each data object into only one cluster.However, in many practical applications, such as the network structure analysis, socialnetwork, biological sciences and so on, each data object may belong to more thanone cluster, classical hard clustering algorithms cannot solve this case.Therefore, the purpose of this study is to investigate the incremental andoverlapping clustering problems in clustering analysis. The contributions of this workare as follows.1. For overlapping clustering problems, this thesis presents a newstatic overlapping clustering algorithm based on representative points. Firstly, thealgorithm finds the representative points in data set. Then it calculates the similar valuesbetween the representative points, then adds some strongly connected edges or weaklyconnected edges between representative points using the three-way decision theoryaccording to the similar values between them. Finally, it finds sub-graphs of therepresentative points graph to complete the clustering of data set.2. For incremental clustering problem, this thesis proposes a new incrementaloverlapping clustering algorithm based on the structure of tree. A searching tree iscreated which is mainly used for storing information related to representative areas ofrepresentative points, the tree helps to reduce the time of searching similar space withthe new data objects. The method of creating searching tree is similar to that of creatingdecision tree, which is built using top-down approach according to the order of attribute importance. The soft incremental clustering algorithm proposed includes three steps. Itfirstly looks for representative points in new data set. Then for each representative point,it finds and updates the searching tree to find its neighbor representative points. Finally,it calculates the similar value between the new representative point and its neighborrepresentative points and uses the idea of three-way decision to cluster the newrepresentative points.The proposed method does not need to define the number of clusters in the data setin advance, it can dynamically determine the number of clusters in the process ofcalculation. The visualization experiments on artificial data set show that the proposedmethod not only can identify clusters of arbitrary shapes, but also can merge smallclusters into the big one when the data changes; the proposed method can detect newclusters which might be the result of splitting or new patterns. In addition, theexperiments on real UCI data sets show that the proposed method has betterperformance especially on F-measure and NMI indices than the compared methods inmost cases. |