| With the rapid development of information technology,massive amounts of data with the characteristics of diversification and high-dimensionality have emerged in various fields of society.These data contain a large number of irrelevant and redundant featu res,causing"dimensionality disaster"and"overfitting"problems for many data mining and machine learning algorithms.Therefore,it is necessary to reduce the dimensionality of the data through feature selection.Feature selection is to use a certain method to identify and eliminate redundant and irrelevant features from the original data set.Without changing the physical characteristics of the data,select the most representative feature subset with weak correlation between the features.Thereby improving the performance of data mining and machine learning algorithms.Due to the time-consuming acquisition of class labels and other reasons,the unsupervised feature selection method is more practical in actual processing.Regularized Self-Representation(RSR)feature selection and Feature-level Self-representation Feature Selection(SR-FS)are currently popular feature selection methods based on regularization constraints.This type of method constructs a self-representation model by assuming that each feature in high-dimensional data can be represented as a linear combination of all features,and imposes regularization constraints on the feature weight matrix for unsupervised feature selection.Based on this theory,this paper uses sparse rule operators L1-norm regularization and inner product regularization to construct different feature selection models,and propose two different unsupervised feature selection methods based on regularized regression models.(1)The feature-level self-representation selection method(SR-FS)uses the nature of self-representation between features,and treats the feature selection process as a loss function model optimization problem,which can effectively evaluate the importance of each feature in batches.However,in the calculation process,because each feature participates in its own reconstruction,the feature weight is excessively concentrated on itself,resulting in the weight cannot be allocated reasonably and the sparsity is small.To solve the above problems,an unsupervised feature selection method based on feature sparse association is proposed.This method first establishes a feature selection model:uses Frobenius norm to establish the loss function term to represent the relationship between features,and imposesL1-norm regularization constraints on the feature weight matrix to strengthen row sparsity.Then,a divide-and-conquer-shrinking threshold iterative algorithm is designed to optimize the objective function.Finally,the importance of each feature is evaluated according to the feature weight,and representative features are selected.Experiments show that the proposed method can reasonably allocate feature weights under the premise of reducing computational complexity,and the selected feature subsets achieve better clustering results and low redundancy rate.(2)The regularized self-representation method(RSR)uses the L2 1,-norm to impose constraints on the weight matrix,but it cannot ensure that the selected feature subset has high sparsity and low redundancy.To solve the above problems,an inner product regularization that can directly describe the independence and significance of variables is introduced into the regularized self-representation loss function model,and an unsupervised feature selection method based on inner product regularization is proposed.The inner product regularization is expressed by the absolute value of the inner product of the feature weight vector,that is,where is the row weight vector of the feature weight matrix.Then,this paper also proposes an effective optimization method to solve the objective function.Experiments show that the unsupervised feature selection method based on inner product regularization can simultaneously achieve high sparsity and low redundancy of feature subsets,can effectively identify important features,and eliminate redundant and irrelevant features.In summary,this article mainly focuses on how to construct a regularized regression model for unsupervised feature selection.Aiming at the shortcomings of the SR-FS method and the RSR method,two improved unsupervised methods are proposed respectively.Experimental results show that the proposed method can select low-redundant feature subsets from high-dimensional data and improve the clustering accuracy. |