Font Size: a A A

Research On Causality-based Feature Selection And Structure Learning

Posted on:2021-02-14Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z L LingFull Text:PDF
GTID:1368330614959957Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In big data era,feature selection is a crucial preprocessing for data analytics.However,the feature selection algorithms based on the correlations between features and a class attribute may make prediction models be lack of interpretability,actionability,and robustness.Causality-based feature selection is to identify a Markov blanket(MB)of the class attribute which consists of parents(direct causes),children(direct effects),and spouses(other direct causes of the direct effects)of the class attribute and thus explicitly induces local causal relationships between the class attribute and the features.Thus,as an emerging approach to identifying potential causal features for building interpretable,actionable,and robust prediction models for classification,causality-based feature selection has been attracting significant attentions from both machine learning and causal discovery domains.Except for classification,causality-based feature selection plays an essential role in learning a local Bayesian network(BN)structure around a variable of interest,since identifying the MB of the variable is key to the local structure learning task.Moreover,if we can identify the MBs of all variables in a data set,these MBs can be used as constraints to reduce search spaces for efficient local-to-global BN structure learning.With causality-based feature selection,in this dissertation,we study causality-based feature selection and causal structure learning and our main contributions are as follows.(1)We propose a novel MB discovery algorithm for balancing efficiency and accuracy,called BAMB(BAlanced Markov Blanket discovery).To achieve this goal,given a class attribute of interest,BAMB finds candidate PC(parents and children)and spouses and removes false positives from the candidate MB set in one go.Specifically,once a feature is successfully added to the current PC set,BAMB finds the spouses with regard to this feature,then uses the updated PC and the spouse set to remove false positives from the current MB set.This makes the PC and spouses of the target as small as possible,and thus to achieve a trade-off between computational efficiency and learning accuracy.In the experiments,we first compare BAMB with 8 state-of-the-art MB discovery algorithms on 7 benchmark Bayesian networks,then we use 10real-world datasets and compare BAMB with 12 feature selection algorithms,including8 state-of-the-art MB discovery algorithms and 4 other well-established feature selection methods.On prediction accuracy,BAMB outperforms 12 feature selectionalgorithms compared.On computational efficiency,BAMB is close to the IAMB algorithm while is much faster than the remaining seven MB discovery algorithms.(2)We propose an efficient local causal structure learning algorithm around a target variable,LCS-FS(Local Causal Structure learning by Feature Selection),to improve the skeleton construction efficiency of local causal structure learning.First,to construct the local causal skeleton of the target,we employ feature selection for finding PC without searching for conditioning sets to speed up PC discovery,leading to improve the skeleton construction efficiency.Second,to orient edges in this local causal skeleton,we propose an efficient method to find separating sets from the subsets of PC for identifying V-structures.With the integration of feature selection and the new way of finding separating sets,LCS-FS recursively finds the spouses of Markov blankets in local causal skeleton for edge orientations,until the direct causes and direct effects of the target are distinguished.The experiments on five benchmark Bayesian networks with the number of variables from 35 to 801 validate that our algorithm achieves higher efficiency and better accuracy than the state-of-the-art local causal structure learning algorithms.(3)We study an interesting and challenging problem,learning any part of a BN structure.In this challenge,it will be computationally inefficient using existing global BN structure learning algorithms to find an entire BN structure to achieve the part of a BN structure in which we are interested.And local BN structure learning algorithms encounter the false edge orientation problem when they are directly used to tackle this challenging problem.In this paper,we first present a new concept of Expand-Backtracking to explain why local BN structure learning methods have the false edge orientation problem,then propose APSL,an efficient and accurate Any Part of BN Structure Learning algorithm.Specifically,APSL divides the V-structures in a MB into two types: collider V-structure and non-collider V-structure,then it starts from a node of interest and recursively finds both collider V-structures and non-collider V-structures in the found MBs,until the part of a BN structure in which we are interested are oriented.To improve the efficiency of APSL,we further design the APSL-FS algorithm using Feature Selection,APSL-FS.Using six benchmark BNs,the extensive experiments have validated the efficiency and accuracy of our methods.
Keywords/Search Tags:Bayesian network, Feature Selection, Markov blanket, Local causal structure learning, Any part of structure learning
PDF Full Text Request
Related items