| With the rapid development of science and technology,we enter the era of big data.The data generated by the application systems grows explosively.These data are not only large in scale but also have high dimensions in areas such as image and video understanding,bioinformatics,text mining,etc.A lot of research and practice shows that not all of the features are related to learning tasks(such as classification)and not all of the features could improve the learning performance.These features are usually sparse and contain a large amount of unrelated and redundant information.Thus,direct utilization on these high-dimensional data usually suffers from high computation cost,heavy storage burden and,performance degradation due to irrelevant and redundant features.Therefore,how to solve the problem of”curse of dimension”and over-fitting which is brought by the high dimension of data has become the focus of research.Feature selection has shown its power to address the above issues.In real-world applications,labeled data are usually less while unlabeled data are much more abundant.However,labeling is rather expensive.Therefore,how to perform feature selection with unsupervised ways has become a research direction which has imperious demands and significant application values.Though many unsupervised feature selection methods have been proposed,most of them still suffer from the following limitation:Most existing unsupervised feature selection methods usually hold the assumption that the features and the pseudo labels induced by cluster structure are linear correlated.However,in practice,the correla-tions are more complex beyond linear correlations.Though it is difficult to explore the complex nonlinear correlation,it is valuable.To cope with the aforementioned prob-lems,we introduce Hilbert Schmidt Independence Criterion(HSIC)to explore more general correlations between the selected features and pseudo cluster labels and pro-pose a single-view nonlinear unsupervised feature selection method and a multi-view nonlinear unsupervised feature selection method in this paper.The main results are as follows:1.The consideration of the nonlinear correlations between the selected features and the pseudo cluster labels with single-view data.A nonlinear unsupervised feature selection method based on HSIC is proposed for single-view data:it learns the struc-ture of the data through spectral clustering,guarantees the sparsity of feature selection matrix by?2,1and measures the nonlinear correlations between the selected features and the pseudo cluster labels through HSIC which makes it different from other methods.2.The exploration of the nonlinear correlations between the selected features and the pseudo cluster labels with multi-view data.A nonlinear unsupervised feature selec-tion method based on HSIC is proposed for multi-view data:It exploits the comple-mentarity between multiple views and learns the structure of multi-view data through multi-view spectral analysis.To guarantee the consistency of multi-view data and deal with noises and outliers,it introduces the consensus pseudo cluster labels which makes our method more robust than other methods.It measures the nonlinear correlations be-tween the selected features and the pseudo cluster labels through HSIC similarly. |