In the era of big data,the information contained in data labels is more and more abundant,and multi-label data appears more frequently in various applications.However,multi-label data is often large in size and high in dimension.In the process of actual model building and learning training,it will lead to too much calculation.Therefore,the need for data dimensionality reduction is becoming more apparent.Feature selection is an important data dimension reduction method.Feature selection methods based on various theories emerge endlessly,such as sparse learning,information theory,manifold learning,and so on.The basic goal of feature selection is to select features that are highly relevant to subsequent learning tasks,to minimize redundancy between features,and to improve the generalization ability of models.However,the number of labels in multi-label data is usually large,and there are some potential associations between these labels.Therefore,how to explore the correlation between labels more effectively is an important research direction to improve the performance of multi-label feature selection.In addition,imbalanced problems are very common in machine learning and have long been considered to be one of the important factors that affect the properties of standard machine learning algorithms.Multi-label datasets have a large number of label categories,sometimes the distribution of labels is sparse,and the impact of imbalanced problems is more prominent.Finally,in practical applications,it is quite difficult to label multi-label data.For multi-label data,larger label space brings higher labeling cost.As the problems we are facing become more and more complex,feature dimension,data volume,label dimension all affect the cost of labeling.Therefore,how to build a better learning model under limited supervision is an important research direction of multi-label learning in recent years.Aiming at high data dimension,miscellaneous data correlation,imbalanced label distribution,limited supervisory information and so on,the main research contents and innovations of this paper are as follows:For the“Dimension Disaster”problem in multi-label learning,a robust multi-label feature selection method based on low-dimensional embedding and manifold regularization is proposed,which uses low-dimensional embedding to implement feature selection process.The feature matrix is associated with the low-dimensional embedding through a linear mapping,then the low-dimensional embedding and the true distribution are kept as consistent as possible.Specifically,this method uses manifold learning to explore the local geometry of features and labels separately,and2,1-norm is used as sparse regularization term to improve the generalization performance of the model.To address the imbalanced distribution of multi-label feature selection,a new embedded multi-label feature selection framework is proposed,multi-label feature selection based on manifold regularization and imbalance ratio.This method establishes a sparse learning framework that combines2,1-norms and considers the local manifold structure between data samples.It can better mine the internal relationship between sample data and improve the recognition ability of feature selection models.In addition,the correlation between data labels is explored through manifold learning.In order to improve the processing ability of this method for imbalanced multi-label data,an imbalanced penalty factor is constructed.To deal with the multi-label data with incomplete supervisory information,a semi-supervised multi-label feature selection method is proposed.This method predicts missing label data based on the local manifold structure of data features and establishes the mapping relationship between feature space and label space.Furthermore,the method combines labeled and unlabeled data for feature selection.Moreover,an improved imbalanced penalty matrix is proposed to deal with the label imbalance problem with limited supervision information in semi-supervised learning.Finally,a semi-supervised multi-label feature selection learning framework based on manifold learning and imbalanced learning is established. |