| In recent years,with the rapid development of science and technology,the scale of multilabel data in various fields continues to expand.In multi-label learning,many datasets are usually accompanied by the dimensional catastrophe problem,and the presence of a huge amount of redundant or irrelevant features can easily lead to degradation of classification performance.Feature selection has been widely studied and promoted as an effective data preprocessing method.However,most of the existing feature selection algorithms still have some limitations.First,although most existing methods consider the correlation among variables,they fail to explain the causality of the selected variables,and the selected subset of features may have redundant information,leading to poor robustness of classification models.Second,multi-label learning also faces the challenge of feature dynamics in an open environment.Causal relationships among variables help to build a more explanatory and robust model.Therefore,it is of wide application value and practical significance to build multi-label feature selection models with causal relationships for multi-label datasets.In this thesis,we take causal feature selection as the core and carry out research related to multi-label learning tasks in static and dynamic environments,respectively,with the following main research work.(1)For the problem of multi-label feature selection ignoring causality,most existing algorithms lack interpretability and cannot reveal the causal mechanism.And Markov blanket(MB)is an important concept in Bayesian networks,which can be used to represent the local causal structure of variables and the optimal feature subset for multi-label feature selection.To select causal features in multi-label data,this thesis proposes multi-label causal feature selection based on neighborhood mutual information(MCFS-NMI)algorithm.First,the parents and children(PC)of each label are discovered using Hiton algorithm.Then,the PC and search spouse(SP)of each label are distinguished based on neighborhood conditional mutual information.In addition,the phenomenon of equivalence information brought by multi-label data sets can lead to some features being ignored.A metric of conditional independence test was designed that can be used to search for the ignored features.Finally,a minimum feature subset was obtained by searching for common features and class attributes among related labels.To validate the performance of MCFS-NMI,it was compared with five classical multi-label feature selection algorithms on six datasets.The experimental results showed that the proposed algorithm achieves a competitive performance compared to all the compared algorithms.(2)For the existing causal multi-label feature selection algorithm cannot handle the stream feature problem.In response,based on the work in(1),the causal feature selection algorithm is further extended to stream knowledge discovery by considering dynamic stream feature scenarios,and the causality-based online multi-label streaming feature selection(COMSFS)algorithm is proposed.The algorithm uses PC and SP interleaved learning methods to mine the local causal structure of labels.In addition,it is able to distinguish PC and SP online when discovering MB,and identify children with parents online when discovering SP.Finally,experiments verify that the classification performance of the proposed algorithm is better. |