Font Size: a A A

Dimension Folding And Sufficient Variable Selection For Longitudinal Data

Posted on:2021-04-12Degree:MasterType:Thesis
Country:ChinaCandidate:K ShenFull Text:PDF
GTID:2370330626954837Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
Longitudinal data integrates information of cross section and time series,which is widely concerned in many fields,including biology,medicine,finance and economics.However,with the rapid development of scientific and technological means,people can often collect large-scale Longitudinal data sets,and the expansion of the dimension of data brings great difficulties and challenges to the statistical analysis of longitudinal data.The sufficient dimension folding method not only keeps the matrix form of independent variables,but also does not lose the regression information.This is a powerful means of supervised dimension reduction for matrix-valued independent variables.Based on the function of distance covariance,this paper introduces a model-free sufficient dimension folding method for dimension reduction of longitudinal data.When the structural dimension is known,it is proved in theory that this dimension reduction criterion can find the central dimension folding subspace in population,namely the central dimension reduction sub-space,and realize the dimension reduction in time and variables.When the sample is given,the dimension reduction method is transformed into a constrained high-dimensional optimization problem,which theoretically proves that the estimates of the central dimension folding subspace are root-n consistent.Computationally,by introducing the Kronecker product assumption,the constrained high-dimensional optimization problem is transformed into a low-dimensional prob-lem,which can be solved quickly by using a mature nonlinear optimization algorithm.In order to determine the structural dimension,this paper proposes a modified BIC criterion,and proves the consistency of the structural dimension determination.In order to select the important variables when there are too many independent variables,this paper proposes a sufficient variable selection method based on the idea of punishment.D-ifferent from the existing variable selection methods based on longitudinal data in the literature,this variable selection method does not need to make assumptions about the model,which avoid-s the unreasonable statistical inference due to improper model assumptions.This model-free variable selection method is the first attempt in the field of longitudinal data.By choosing the tuning parameter properly,it is theoretically proved that this variable selection method has Oracle property.Computationally,this paper proposes a modified BIC criterion,which can adaptively choose the tuning parameter,and proves the consistency of the tuning parameter selection.In order to examine the finite sample performance of the proposed sufficient dimension reduction method(DF-DCOV),two simulation examples are considered:the unbalanced longi-tudinal data with continuous response variable and the balanced longitudinal data with discrete response variable.The simulation results show that,compared with the dimension reduction methods in the literature,DF-DCOV method can estimate the central dimension folding subspace more accurately and has higher accuracy in determining the structural dimension,regardless of whether the Kronecker product condition is satisfied or not.Furthermore,variable selection is carried out based on the sufficient variable selection method(DF-PDCOV)proposed in this pa-per.The results show that the sufficient variable selection not only improves the accuracy of the estimation of the central dimension folding subspace,but also can select the really useful subset of variables with high accuracy,and the consistency of variable selection is verified numerically.Finally,the balanced longitudinal data sample extracted from the data of primary biliary cirrhosis from Mayo clinic is used for empirical analysis,and the dimension of independent vari-ables and time points are reduced to 1.The results show that,in the independent variables,the alkaline phosphatase and the prothrombin time have significantly positive relationship with the serum bilirubin,and the albumin has significantly negative relationship with the serum biliru-bin,which is consistent to the medical outcome.Three independent variables are screened out through sufficient variable selection,and the data is projected into the low-dimensional space,indicating that this method can realize sufficient dimension folding and variable selection at the same time,and retain the original information of the data.
Keywords/Search Tags:Longitudinal data, Distance covariance, Sufficient dimension reduction, Dimension folding, Sufficient variable selection
PDF Full Text Request
Related items