Font Size: a A A

Research On Confounder Detection And Causal Structure Learning For Incomplete Observed Data

Posted on:2021-02-17Degree:DoctorType:Dissertation
Country:ChinaCandidate:W ChenFull Text:PDF
GTID:1360330602493445Subject:Computer applications engineering
Abstract/Summary:PDF Full Text Request
Causal discovery is a core issue in the research on data science.Exploring and discovering the causal relationship between objects has important applications in discovering the underlying causal mechanism behind events and assisting decision-making.which has attracted attention in many fields such as economics,bioinformatics,neuroscience.social network analysis.etc.However,it is not easy to obtain complete observation data during the data collection process.Inferring causal relation from such data is prone to the false causal relationship between observed variables due to the(latent)confounders.Applying such causal inference method to high-dimensional data in real-world will easily propagate the false relationships into more cases,resulting in unreliable causal structure learning.Therefore,in this dissertation,we try to solve these four challenges brought by incomplete observation data:the high false discovery rate on inferring causal relationship,insufficient ability to discover latent confounders,unreliable causal structure learning from high-dimensional data,and limited applicable scenarios.We propose causal relationship inference method,latent confounders detection method and causal structure learning method,and show the correctness and reliability of these methods.Moreover,these methods have certain application value in the fields of wireless network performance optimization,neuroscience,and social network behavior analysis.Specifically,the research content is as follows.1.A causal structure learning algorithm MCLiNG AM based on multiple sets of canonical correlated variables is proposed.MCLiNGAM algorithm introduces implicit multiple sets of canonical correlation variables,and establishes a linear non-Gaussian acyclic model of multiple sets of canonical correlation variables,which effectively represents the causal relationship between canonical correlation variables and their corresponding multiple observation variables.Considering the characteristics of the linear causal functional-based model,we propose an objective function that maximizes the non-Gaussianity of noise based on the kurtosis,and fuses the relationship constraints between the canonical correlation variables and the observed variables to solve the problem of inferring causal relationship among the implicit canonical correlation variables.Finally,MCLiNGAM algorithm constructs a causal network of multiple sets of canonical correlation variables,and quickly finds the optimal solution or an approximate optimal solution with higher accuracy.MCLiNGAM effectively solves the problem of learning the causal network of multiple sets of hidden canonical correlation variables.The performance of MCLiNGAM has been verified on simulation experiments and real-world wireless network performance optimization dataset.2.In order to solve the case where multiple hidden latent cannot be represented by a set of observed variables,we propose a multiple latent confounders detection algorithm based on causal structure learning-MLCLiNGAM.MLCLiNGAM first learns the causal skeleton of observed variables by utilizing a constraint-based method;then based on the characteristics of linear non-Gaussian data generating process,inferring the causal relationship between those observed variables that are not affected by latent confounders;finally,it uses the maximal acyclic causal skeleton technique to detect the existence and number of latent confounders,and then constructs a causal network containing latent confounders.This method can quickly find the observed variables affected by the latent confounders and solve the difficulty in learning the causal structure containing multiple latent confounders.MLCLiNGAM obtains the highest precision on the simulated experimental dataset and the commonly used benchmark dataset.3.In order to further solve the problem that MLCLiNGAM cannot identify the causal structure among latent confounders and observed variables,we propose a causal structure learning algorithm for incomplete observation data.FRITL algorithm firstly obtains a Partial Ancestral Graph(PAG)based on the conditional independence test;secondly determines the causal relationship between pair of unconfounded observed variables;thirdly uses the Triad constraint to detect and merge the same latent confounders;and finally applies an over-complete independent component analysis technique to estimate the small local causal structures that have not been determined.This method leveraging the advantages of the reliability of conditional independence detection and the correctness of the local structure learning method,can quickly locate the latent confounders and also can be applied in small sample data and high dimensional data.This methods perfectly solves the problems of latent confounders detection and causal structure learning on small samples and high-dimensional data,and has been verified on the fMRI data for effectiveness and correctness,which has reference significance for other research fields.4.Considering that the above method can only solve the problem of continuous data,we propose a potential user causal structure learning method based on discrete social network user behavior MCN,and it is applied in practical scenarios.MCN uses a minimum description length criterion and a constraint-based causal discovery method to mine the non-redundant causal relationships behind user behavior sequences.We use transfer entropy with adaptive causal time lag length to detect causal directions and find the length of causal lag.Since there may be some redundant edges in the causal network,we adopt a permutation-based method to remove them.The results of simulation experiments show that our method can significantly improve the ability and correctness of causal structure learning for social network users.We also applied this method to Sina Weibo data and discover a number of interesting results.It provides relevant experience for subsequent research on causal structure learning of social network users.
Keywords/Search Tags:incomplete observed data, constraint-based method, causal functional model, confounder detection, causal structure learning
PDF Full Text Request
Related items