Research On Multi-Label Feature Selection Algorithm With Causal Relationship

Posted on:2024-07-19

Degree:Master

Type:Thesis

Country:China

Candidate:J Wang

Full Text:PDF

GTID:2568307064455844

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

In recent years,with the rapid development of science and technology,the scale of multilabel data in various fields continues to expand.In multi-label learning,many datasets are usually accompanied by the dimensional catastrophe problem,and the presence of a huge amount of redundant or irrelevant features can easily lead to degradation of classification performance.Feature selection has been widely studied and promoted as an effective data preprocessing method.However,most of the existing feature selection algorithms still have some limitations.First,although most existing methods consider the correlation among variables,they fail to explain the causality of the selected variables,and the selected subset of features may have redundant information,leading to poor robustness of classification models.Second,multi-label learning also faces the challenge of feature dynamics in an open environment.Causal relationships among variables help to build a more explanatory and robust model.Therefore,it is of wide application value and practical significance to build multi-label feature selection models with causal relationships for multi-label datasets.In this thesis,we take causal feature selection as the core and carry out research related to multi-label learning tasks in static and dynamic environments,respectively,with the following main research work.(1)For the problem of multi-label feature selection ignoring causality,most existing algorithms lack interpretability and cannot reveal the causal mechanism.And Markov blanket(MB)is an important concept in Bayesian networks,which can be used to represent the local causal structure of variables and the optimal feature subset for multi-label feature selection.To select causal features in multi-label data,this thesis proposes multi-label causal feature selection based on neighborhood mutual information(MCFS-NMI)algorithm.First,the parents and children(PC)of each label are discovered using Hiton algorithm.Then,the PC and search spouse(SP)of each label are distinguished based on neighborhood conditional mutual information.In addition,the phenomenon of equivalence information brought by multi-label data sets can lead to some features being ignored.A metric of conditional independence test was designed that can be used to search for the ignored features.Finally,a minimum feature subset was obtained by searching for common features and class attributes among related labels.To validate the performance of MCFS-NMI,it was compared with five classical multi-label feature selection algorithms on six datasets.The experimental results showed that the proposed algorithm achieves a competitive performance compared to all the compared algorithms.(2)For the existing causal multi-label feature selection algorithm cannot handle the stream feature problem.In response,based on the work in(1),the causal feature selection algorithm is further extended to stream knowledge discovery by considering dynamic stream feature scenarios,and the causality-based online multi-label streaming feature selection(COMSFS)algorithm is proposed.The algorithm uses PC and SP interleaved learning methods to mine the local causal structure of labels.In addition,it is able to distinguish PC and SP online when discovering MB,and identify children with parents online when discovering SP.Finally,experiments verify that the classification performance of the proposed algorithm is better.

Keywords/Search Tags:

Multi-label feature selection, Neighborhood mutual information, Streaming feature, Causal relationship, Markov blanket

PDF Full Text Request

Related items

1	Causality-based Feature Selection Research
2	Methods Of Feature Selection And Local Causal Discovery For High-dimensional Data Based On Markov Blanket
3	Online Streaming Feature Selection Algorithms:from Correlation To Causality
4	Streaming Multi-label Feature And Label Specific-feature Selection Algorithm Based On Mutual Information
5	The Research Of Multi-label Feature Selection Based On Mutual Information And Feature Label Relationship
6	Research On Feature Selection Based On Information Metrics
7	Research On Multi-label Feature Selection Algorithm For Dynamic Environment
8	The Research On Causal Feature Selection Algorithm Based On AD-tree
9	Research On Feature Selection Technology Based On Markov Blanket Representative Set
10	Online Streaming Feature Selection Based On Adaptive Neighborhood Rough Set