Font Size: a A A

Some New Dimension Reduction Methods And Their Applications In Variable Screening And Causal Inference

Posted on:2022-05-03Degree:DoctorType:Dissertation
Country:ChinaCandidate:L LiFull Text:PDF
GTID:1487306482487474Subject:Statistics
Abstract/Summary:PDF Full Text Request
The rapid development of data collection technology in areas such as biology,financial econometrics,and signal processing,has posed a great challenge for traditional multivariate analysis.High-dimensional data analysis becomes ubiquitous and increasingly important.Dimension reduction,and in particular sufficient dimension reduction for regression,offers an appealing avenue to tackle high-dimensional problems.It is often desirable to reduce the dimensionality of the problem by replacing the original high-dimensional data with a low-dimensional space composed of a few linear combinations of predictors,which are usually much smaller than the original dimension.Partial dimension reduction arises when the predictors naturally fall into two sets X and W,and pursues a partial dimension reduction of X.We propose some new sufficient dimension reduction methods and apply the classical sufficient dimension reductions to causal inference.The content of this thesis is as follows:(1)Though partial dimension reduction is a very general problem,only very few research results are available when W is continuous.To the best of our knowledge,these methods generally perform poorly when X and W are related,furthermore,none can deal with the situation where the reduced lower-dimensional subspace of X varies with W.To address such issue,we in this paper propose a novel variable dependent partial dimension reduction framework and adapt classical sufficient dimension reduction methods into this general paradigm.The asymptotic consistency of our method is investigated.Extensive numerical studies and real data analysis show that our variable dependent partial dimension reduction method has superior performance comparing to the existing methods.(2)Martingale difference divergence measures the departure of conditional mean independence of two random vectors.We develop a generalized martingale difference divergence(and its correlation)based on symmetric Levy measures to detect such an independence.We apply the generalized martingale difference correlation as a marginal utility to do high-dimensional variable screening.Both simulation results and real data illustrations show the promising performance of the developed indexes.(3)We provide a review of several popular inverse regression methods,including sliced inverse regression(SIR)method and principal hessian directions(PHD)method.In addition,we adopt a conditional characteristic function approach and develop a new class of slicing-free methods,which are parallel to the classical SIR and PHD,and are named weighted inverse regression ensemble(WIRE)and weighted PHD(WPHD),respectively.Relationship with recently developed MDDM(Martingale difference divergence matrix)and VMDDM(Volatility MDDM)is also revealed.At the sample level,we show that the sample estimators of WIRE and WPHD methods are root-n consistent and ladle approach is used to determine the dimension of central subspace.(4)Conditional average treatment effect(CATE)is to capture the heterogeneity of a treatment effect across subpopulations.Under unconfoundedness assumption,the outcome regression approach is applied to estimate CATE.We propose and study different regression-based CATE estimators under,respectively,true(oracle),parametric,nonparametric and semiparametric dimension reduction structure and derive their asymptotically linear representations and asymptotic normality.According to asymptotic variance functions,we derive the asymptotic efficiency ranking about the four estimations in general;how the efficiency is related to the affiliation of the given covariates;and what roles of bandwidth and kernel function selections play for the efficiency performance;and how the nonparametric-based CATE is superior to the other estimation methods and in which scenarios the semiparametric-based CATE should be used.Further,we prove that any regression-based CATE can be asymptotically more efficient than any propensity score-based CATE.These results give a relatively complete picture of regression-based CATE.
Keywords/Search Tags:Causal inference, Levy Measure, Martingale difference diver-gence, Partial dimension reduction, Sufficient dimension reduction, Variable screening
PDF Full Text Request
Related items