Font Size: a A A

Research On Subspace Clustering Algorithm Guided By Soft Labels

Posted on:2024-04-09Degree:MasterType:Thesis
Country:ChinaCandidate:K D ChenFull Text:PDF
GTID:2568307103975389Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Clustering has always been a hot topic in different fields such as machine learning,pattern recognition and data mining.The purpose of clustering high-dimensional data is to divide data points into corresponding clusters according to similarity.The intuitive way to solve the problem of high-dimensional representation is to project the data into a low-dimensional subspace,reduce the feature dimension in the low-dimensional subspace,and enhance the discrimination ability.In most existing models,subspace learning and feature selection are independent of each other.In addition to distinguishing features,the data in high-dimensional representation may also contain redundant or even noisy features,which inevitably worsens the subsequent clustering performance,and also brings more storage requirements and data processing computational overhead.As one of the most popular partitioning methods,K-means clustering learns k independent clusters by minimizing the within-cluster scatter.Generally,K-means clustering is a hard clustering method,which is only applicable to the case of good clustering separation.However,it is difficult to meet in real scenarios,that is,clusters are usually overlapping.In addition,most existing models use one-dimensional vector representation of data as input.In this thesis,the above two problems are studied respectively,and the following two algorithms are proposed:(1)For most models,subspace learning and feature selection are independent of each other,and the parameter setting is complex,this thesis proposes a soft label guided unsupervised discriminant sparse subspace feature selection model(UDS2FS).UDS2FS combines subspace learning and feature selection,and performs subspace K-means clustering for data clustering allocation estimation,which essentially minimizes the within-class scatter of data;At the same time,we maximize the between-class scatter to find the discriminant subspace.After finding a better discriminant subspace,continue to perform subspace K-means clustering,the obtained cluster labels are called soft label,repeat the iteration until the function converges.In addition,l2,0-norm is used to replace the widely usedl2,1-norm for accurate feature selection.The experimental results show that UDS2FS has excellent performance in data clustering.(2)In order to solve the problems of structural information loss caused by one-dimensional vector input and hard clustering that can not meet the real scene,this thesis proposes a two-dimensional embedded fuzzy data clustering(2DEFC).2DEFC does not vectorize two-dimensional data,but directly takes the two-dimensional representation of data points as input,preserving more data structure information.The unsupervised dimensionality reduction and subspace fuzzy data clustering are combined,and the two subspace projection matrices are jointly optimized by using the data membership degree to obtain better clustering results and better subspace projection matrix,which also effectively avoids the suboptimal restrictions caused by the traditional dimensionality reduction and then clustering method.The experimental results show that 2DEFC model has achieved competitive performance in data clustering.
Keywords/Search Tags:Clustering, Feature Selection, Joint Optimization, Dimension Reduction, Soft Label, Two-Dimensional Data
PDF Full Text Request
Related items