Font Size: a A A

Three-way Clustering Based On Ensemble Learning

Posted on:2024-03-26Degree:MasterType:Thesis
Country:ChinaCandidate:J C FanFull Text:PDF
GTID:2568307154996309Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The objection of clustering is to mine the potential similarity relation of data,which divide the samples with high similarity into the same cluster and divide the samples with low similarity into different cluster.However,there usually not have a clear attribution relationship between samples and clusters in the real environment,and forced division may increase the risk of decision-making.Soft clustering is proposed to solve this problem,soft clustering allows samples to have intersection and a sample belongs to at least one cluster.Three-way clustering is a type of soft clustering algorithm that divides clusters into two sets:the core domain and the boundary domain.Three-way clustering is a soft clustering algorithm that uses a set to divide the clusters into two parts: the core region and the fringe region.This algorithm can effectively describe the fuzzy boundaries of clusters and reduce decision risks.But most current researches focus on the theory extension,and there are few researches on the ensemble clustering problem.From the perspective of three-way decision,this thesis studies the ensemble learning of three-way clustering,the contents are as follows:(1)The traditional spectral clustering algorithm produces clusters with crisp boundaries,which may not reflect the fact that one cluster may not have a well-defined boundary in the real situations.Additionally,distance measures used in spectral clustering may not satisfy both global and local consistency,particularly for the data with multi-scale.To address the above limitations,this thesis firstly presents a three-way density-sensitive spectral clustering algorithm that represents a clustering using the core region and the fringe region.In the proposed algorithm,using density-sensitive distance to produce a similarity matrix.The overlapping clustering method is used to determine the upper bound of each cluster,and then the core region is separated from the upper bound using perturbation analysis.We develop an improved ensemble three-way spectral clustering algorithm based on ensemble strategy because a single clustering algorithm does not always yield good clustering results.The proposed ensemble algorithm randomly extracts feature subset of sample and uses the three-way density-sensitive clustering algorithm to obtain the diverse base clustering results.Based on the base clustering results,voting method is used to generate a three-way clustering result.The experimental results show that the three-way density-sensitive clustering algorithm can well explain the data structure and maintain a good clustering performance at the same time,and the ensemble three-way density-sensitive spectral clustering can improve the robustness and stability of clustering results.(2)Traditional hard clustering uses a single set with clear boundaries to represent a cluster.This does not solve the problem of inaccurate decision-making caused by inaccurate information or insufficient data.In order to solve this problem,three-way clustering was presented to show the uncertainty information in the data set by adding the concept of fringe region.Different with the existing clustering ensemble methods by using various clustering algorithms to produce the base clustering,the proposed algorithm randomly extracts feature subset of sample and used traditional clustering algorithm to obtain the diverse base clustering results.Based on the base clustering results,label matching is used to align all clustering results in a given order and voting method is used to obtain the core region and the fringe region of the three-way clustering.The proposed algorithm can be applied on the top of any existing hard clustering algorithm to generate the base clustering results.The experimental results show that the proposed algorithm is effective in revealing clustering structures.(3)Ensemble clustering obtains unified data division by fusing multiple base clusters,and three-way clustering displays uncertain information in data set by introducing boundary region.This thesis proposes a three-way ensemble clustering based on sample’s perturbation theory,which can effectively improve the clustering result caused by inaccurate information.The algorithm combines with K-nearest neighbor algorithm to generate perturbated data set,we can achieve different data sets by setting different K value then using traditional clustering algorithm to obtain different base clusters.The core region and fringe region can be achieved by label matching and voting.The experimental results indicate that using the model can effectively improve the structure of clustering results and enhance the accuracy of clustering.
Keywords/Search Tags:Three-way decision, Three-way clustering, Ensemble clustering, Cluster validity index
PDF Full Text Request
Related items