Font Size: a A A

Linear Structure Causal Model Estimation Based On Observed Data

Posted on:2020-11-07Degree:DoctorType:Dissertation
Country:ChinaCandidate:N N FeiFull Text:PDF
GTID:1360330602450183Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
Nowadays,people collect and mine a large amount of data in the fields of science,entertainment,commerce and industry,and study and apply them.As Rutherford D.Rogerd et al.said,"The world is submerged in data,we are submerged in the ocean of information,but we are thirsty for knowledge." Massive information needs to be sorted out and its essence removed.People are eager to recover the basic information of a group of data with fewer parameters to realize the causal mechanism of mining data.In recent years,some methods of discovering causal structure from non-experimental data have been proposed.These methods make various assumptions about the data generating process in order to estimate the causality from the observed data purely.This thesis aims at improving the causal inference ability of continuous variables based on the background of big data,and studies it from four aspects: objectively estimating causal order,estimating latent variables,constructing a new framework of causal model,and proposing a nonlinear causal structure.In this thesis,causality learning based on observed data provides an effective supplement for correct learning causal structure and estimating causal effect in this field.The main contents and innovations are as follows:1.Path analysis,as the main method of describing causal dependence among variables,has been adopted by many researchers.In this thesis,an integrated causal path identification method is proposed to overcome the shortcomings of artificial causal ordering in path analysis and the viewpoint that "there is hardly any method to study and infer all causal relationships".Firstly,the thesis use a direct linear non-Gaussian acyclic model(DirectLiNGAM)to estimate the causal order and initial connection strength matrix of variables objectively.In view of the inconvenience of the initial connection strength matrix for model interpretation,the thesis use Adaptive lasso in the linear model selection method to reduce redundant directed edges and obtain the connection strength matrix again.According to the pruned connection strength matrix,a recursive model and causal path diagram are established and plotted respectively.Through the test of causal path diagram,the directed edges and variables that did not pass the model fitness test were found.By changing the direction and deleting variables,the causal model and causal path diagram with higher fitness were obtained,and the direct,indirect effects and total effects among variables were estimated.2.On the basis of the above causal path identification method in path analysis,this thesis then studies the causal inference between latent variables and observed variables,among observed variables in the presence of latent variables.Based on exploratory factor analysis(EFA)and path analysis(PA),this thesis proposes a method to establish a linear causal model framework between latent variables and observed variables,and among observed variables by EFA-PA.EFA-PA method is similar to the idea of establishing linear causal model by structural equation model(SEM),but it has three advantages compared with structural equation model: First,EFA based on principal component analysis clearly identifies latent variables and estimates the number of latent variables,which makes it easy to establish measurement models(i.e.linear structural causal model between latent variables and observed variables);Secondly,based on PA method,the linear structure causal model among observed variables is estimated,which makes up for the lack of fully exploring the causality among observed variables by SEM.Third,it reduces the blindness of adjusting causal path when the fitting degree of SEM is not high.Further,in view of the fact that the causality between observed variables is not entirely linear model in reality,the thesis release the limitation of linear causal structure among observed variables in EFA-PA,and propose a causal model when the relationship between observed variables is nonlinear(including linear),i.e.generalized nonlinear additive causal model(GNACM).The definition,the estimation method and the advantages of GNACM are given.3.In view of the three defects of traditional SEM based on the background of big data and statistical machine learning,an expanded SEM method,ESEM,is proposed in this thesis.The ESEM framework consists of three types of models:(1)Structural models(linear structural causal model among latent variables);(2)Measurement models(linear structural causal model between latent variables and observed variables);(3)Observation models(linear structural causal model among observed variables).The advantages of ESEM are to complement the identification of causal direction of latent variables,to add the estimation of causal relationship of observed variables,and to fully mine the scientific information implied by observed variables.Finally,the thesis test and debug the ESEM model using various fitting indexes in the experiment,and get a better fit ESEM model.We also get the universality of the method when the observed data obey the Gaussian distribution and the disturbance variables obey the non-Gaussian distribution.
Keywords/Search Tags:Observed data, Linear structure causal model, Latent variable, Causal path recognition, ESEM, Generalized nonlinear additive causal model
PDF Full Text Request
Related items