Research On Subspace Clustering Algorithm Guided By Soft Labels

Posted on:2024-04-09

Degree:Master

Type:Thesis

Country:China

Candidate:K D Chen

Full Text:PDF

GTID:2568307103975389

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Clustering has always been a hot topic in different fields such as machine learning,pattern recognition and data mining.The purpose of clustering high-dimensional data is to divide data points into corresponding clusters according to similarity.The intuitive way to solve the problem of high-dimensional representation is to project the data into a low-dimensional subspace,reduce the feature dimension in the low-dimensional subspace,and enhance the discrimination ability.In most existing models,subspace learning and feature selection are independent of each other.In addition to distinguishing features,the data in high-dimensional representation may also contain redundant or even noisy features,which inevitably worsens the subsequent clustering performance,and also brings more storage requirements and data processing computational overhead.As one of the most popular partitioning methods,K-means clustering learns k independent clusters by minimizing the within-cluster scatter.Generally,K-means clustering is a hard clustering method,which is only applicable to the case of good clustering separation.However,it is difficult to meet in real scenarios,that is,clusters are usually overlapping.In addition,most existing models use one-dimensional vector representation of data as input.In this thesis,the above two problems are studied respectively,and the following two algorithms are proposed:（1）For most models,subspace learning and feature selection are independent of each other,and the parameter setting is complex,this thesis proposes a soft label guided unsupervised discriminant sparse subspace feature selection model（UDS²FS）.UDS²FS combines subspace learning and feature selection,and performs subspace K-means clustering for data clustering allocation estimation,which essentially minimizes the within-class scatter of data;At the same time,we maximize the between-class scatter to find the discriminant subspace.After finding a better discriminant subspace,continue to perform subspace K-means clustering,the obtained cluster labels are called soft label,repeat the iteration until the function converges.In addition,l_2,0-norm is used to replace the widely usedl_2,1-norm for accurate feature selection.The experimental results show that UDS²FS has excellent performance in data clustering.（2）In order to solve the problems of structural information loss caused by one-dimensional vector input and hard clustering that can not meet the real scene,this thesis proposes a two-dimensional embedded fuzzy data clustering（2DEFC）.2DEFC does not vectorize two-dimensional data,but directly takes the two-dimensional representation of data points as input,preserving more data structure information.The unsupervised dimensionality reduction and subspace fuzzy data clustering are combined,and the two subspace projection matrices are jointly optimized by using the data membership degree to obtain better clustering results and better subspace projection matrix,which also effectively avoids the suboptimal restrictions caused by the traditional dimensionality reduction and then clustering method.The experimental results show that 2DEFC model has achieved competitive performance in data clustering.

Keywords/Search Tags:

Clustering, Feature Selection, Joint Optimization, Dimension Reduction, Soft Label, Two-Dimensional Data

PDF Full Text Request

Related items

1	Research On Unsupervised Feature Selection Methods Based On Soft-Label Learning
2	Unsupervised Clustering Algorithm Based On Dimension Reduction
3	Research And Application Of Dimension Reduction Method Based On Feature Selection
4	Dimension Reduction Technology Research Based On Text Features
5	Research On Dimension Reduction Methods For High-dimensional Complex Data
6	Variable Selection And Dimension Reduction For Soft Sensing In Fermentation Processes
7	Research On Dimension Reduction Algorithms For Preserving Clustering Structures
8	Research On Dimensionality Reduction Method Of High Dimensional Data For Trend Prediction
9	High-dimensional Data Processing And Forecasting Based On Feature Learning
10	Large Margin Based Multi-label Feature Selection