Research On Unsupervised Feature Selection Method Based On Regularized Regression Model

Posted on:2022-06-18

Degree:Master

Type:Thesis

Country:China

Candidate:S Z Bai

Full Text:PDF

GTID:2518306542480984

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of information technology,massive amounts of data with the characteristics of diversification and high-dimensionality have emerged in various fields of society.These data contain a large number of irrelevant and redundant featu res,causing"dimensionality disaster"and"overfitting"problems for many data mining and machine learning algorithms.Therefore,it is necessary to reduce the dimensionality of the data through feature selection.Feature selection is to use a certain method to identify and eliminate redundant and irrelevant features from the original data set.Without changing the physical characteristics of the data,select the most representative feature subset with weak correlation between the features.Thereby improving the performance of data mining and machine learning algorithms.Due to the time-consuming acquisition of class labels and other reasons,the unsupervised feature selection method is more practical in actual processing.Regularized Self-Representation（RSR）feature selection and Feature-level Self-representation Feature Selection（SR-FS）are currently popular feature selection methods based on regularization constraints.This type of method constructs a self-representation model by assuming that each feature in high-dimensional data can be represented as a linear combination of all features,and imposes regularization constraints on the feature weight matrix for unsupervised feature selection.Based on this theory,this paper uses sparse rule operators L₁-norm regularization and inner product regularization to construct different feature selection models,and propose two different unsupervised feature selection methods based on regularized regression models.（1）The feature-level self-representation selection method（SR-FS）uses the nature of self-representation between features,and treats the feature selection process as a loss function model optimization problem,which can effectively evaluate the importance of each feature in batches.However,in the calculation process,because each feature participates in its own reconstruction,the feature weight is excessively concentrated on itself,resulting in the weight cannot be allocated reasonably and the sparsity is small.To solve the above problems,an unsupervised feature selection method based on feature sparse association is proposed.This method first establishes a feature selection model:uses Frobenius norm to establish the loss function term to represent the relationship between features,and imposesL₁-norm regularization constraints on the feature weight matrix to strengthen row sparsity.Then,a divide-and-conquer-shrinking threshold iterative algorithm is designed to optimize the objective function.Finally,the importance of each feature is evaluated according to the feature weight,and representative features are selected.Experiments show that the proposed method can reasonably allocate feature weights under the premise of reducing computational complexity,and the selected feature subsets achieve better clustering results and low redundancy rate.（2）The regularized self-representation method（RSR）uses the L_{2 1,}-norm to impose constraints on the weight matrix,but it cannot ensure that the selected feature subset has high sparsity and low redundancy.To solve the above problems,an inner product regularization that can directly describe the independence and significance of variables is introduced into the regularized self-representation loss function model,and an unsupervised feature selection method based on inner product regularization is proposed.The inner product regularization is expressed by the absolute value of the inner product of the feature weight vector,that is,where is the row weight vector of the feature weight matrix.Then,this paper also proposes an effective optimization method to solve the objective function.Experiments show that the unsupervised feature selection method based on inner product regularization can simultaneously achieve high sparsity and low redundancy of feature subsets,can effectively identify important features,and eliminate redundant and irrelevant features.In summary,this article mainly focuses on how to construct a regularized regression model for unsupervised feature selection.Aiming at the shortcomings of the SR-FS method and the RSR method,two improved unsupervised methods are proposed respectively.Experimental results show that the proposed method can select low-redundant feature subsets from high-dimensional data and improve the clustering accuracy.

Keywords/Search Tags:

unsupervised feature selection, dimensionality reduction, sparse representation, feature correlation, regularization

PDF Full Text Request

Related items

1	Dimensionality Reduction Based On Sparse Representation
2	Study Of Graph-based Feature Extraction And Feature Selection With Their Applications
3	Dimensionality Reduction Of Hyperspectral Image Based On Sparse Representation And Low-rank Representation
4	The Study Of Some Issues For Unsupervised And Semi-supervised Dimensionality Reduction
5	Research On Feature Selection Algorithms Based On Pairwise Constraints And Sparse Representation
6	Unsupervised Feature Selection Based On Sparse Regression
7	Research On Dimensionality Reduction Of High-Dimensional Data
8	Research On Unsupervised Feature Learning Algorithms Based On Sparse Modeling And Information Theory Learning
9	Feature Selection Method Based On Regularization Term And Sparse Representation
10	Study On Unsupervised Feature Selection Algorithm Based On Regularized Matrix Factorization