| As an important way of dimensionality reduction,feature selection has been used to process all kinds of high-dimensional data.Its goal is to make the selected data more concise without losing the original information.The surge of big data in recent years has posed a major challenge to feature selection.This paper mainly proposes two improved feature selection models for feature selection based on sparse learning,and proves the effectiveness and feasibility of the model through a series of experiments.The main research work is as follows:Firstly,a more robust feature selection clustering method is proposed for the unsupervised feature selection model of sparse learning.Firstly,because most unsupervised feature selection uses the quality of clustering as the evaluation standard of feature selection,it usually selects the features first,and then clusters the selected features.In this way,although each step is optimal,the best clustering effect is not taken as the goal,and there will be some deviation in the result.Therefore,this paper proposes a fusion feature selection model.That is,embed the unsupervised feature selection model into K-means clustering,which can perform clustering while feature selection.This reduces the error between feature selection and clustering.Secondly,outliers are usually taken into account in feature selection,which will affect the final clustering effect.Therefore,the model removes outliers in feature selection.Compared with classical methods,experiments show that this method has better clustering effect.Secondly,the classification model based on sparse learning feature selection is generalized.The previous feature selection models only consider whether the samples can be separated,but do not consider whether the separability of the model is good.Based on lasso model,this paper adds a penalty term to its objective function,which ensures that the selected features can not only correctly classify the samples,but also increase the distance between different classes and reduce the spacing between classes.This makes it easier for the classifier trained by the selected data feature subset to distinguish different classes of samples.This makes it easier for the classifier trained on the subset of features in the selected data to distinguish samples from different classes. |