Font Size: a A A

Research On Unsupervised Feature Selection Methods Based On Soft-Label Learning

Posted on:2022-05-30Degree:MasterType:Thesis
Country:ChinaCandidate:F WangFull Text:PDF
GTID:2518306335972969Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of information technology,unlabeled high-dimensional data appears widely.Unsupervised feature selection,as a common way of data dimension reduction,receives considerable attention in various research fields.Due to the lack of semantic labels,the performance improvement of unsupervised feature selection method is a big challenge.Pseudo-label strategy is proposed to solve this problem.It first learns pseudo-labels based on the structured graph and other relevant knowledge,and then uses the learned pseudo-labels to guide feature selection process.This strategy transforms unsupervised problem into supervised problem and shows great potential on improving the performance of feature selection.Although many unsupervised feature selection methods based on pseudo-label strategy have been proposed,they generally suffer from the following problems.The noise,redundant and irrelevant features in the original feature space will reduce the quality of pseudo-labels.Existing methods usually cannot deal with them effectively.Fuzziness is the feature of real-world data.In most cases,a sample does not strictly belong to a cluster or does not belong to a cluster,but belongs to all clusters simultaneously with different membership degree.Existing methods ignore the above fuzziness,and learn hard-labels that are either 0 or 1 as semantic supervision information,which causes the loss of important information.Pseudo-label learning and feature selection are independent with each other,and there is no interaction between them.Thus,the performance of feature selection is sensitive to the quality of the pre-constructed pseudo-labels.In addition,with the explosive growth of high-dimensional data,how to improve the efficiency of feature selection methods has become an urgent problem to be solved.Based on the above analysis,this paper proposes the unsupervised feature selection method based on soft-label learning.It includes the following two works:(1)An unsupervised feature selection method with adaptive soft-label learning is proposed.This method first introduces a projection matrix to transform the initial features into the robust low-dimensional representation.Then,an membership matrix is learned based on the local distance between the data samples and the cluster centroids in the low-dimensional space.The learned membership matrix is consistent to data fuzziness and can be determined as soft-label matrix.Finally,the soft-label learning and feature selection are integrated into a unified learning framework.Under the guidance of soft-labels,feature selection matrix can be generated by sparse regression model.In addition,an efficient iterative optimization strategy is designed to solve the projection matrix,cluster centroids,soft-label matrix and feature selection matrix.The updating of the above four variables promotes each other in the iterative process,and thus the high-quality feature selection matrix can be obtained to extract the discriminant feature subset.(2)A unsupervised feature selection method with two-stage soft-label learning is proposed.The method first utilizes the efficient K-means clustering algorithm to determine the cluster centroids,and then learns fixed initial soft-labels based on the local distance between the samples and the cluster centroids.On this basis,two regression models are used to perform soft-label dynamic adjustment and feature selection simultaneously.The soft-labels can guide the feature selection process,and the feature selection process can promote the dynamic adjustment of soft-labels.This method can simplify the learning steps of soft-labels and improve the efficiency of the approach.At the same time,the soft-label learning and the feature selection process can promote each other and ensure the precision of the approach.In a word,we put forward two unsupervised feature selection models based on soft-label learning.Promising experimental results on clustering task show that,compared with the existing unsupervised feature selection methods,the proposed methods can significantly improve the quality of feature subsets and the precision of subsequent learning algorithms,and have high learning efficiency.
Keywords/Search Tags:Unsupervised feature selection, Dimension reduction, Fuzziness, Soft-label
PDF Full Text Request
Related items