Font Size: a A A

A Dimensionality Reduction Method Based On Martingale Divergence For Longitudinal Data

Posted on:2022-02-09Degree:MasterType:Thesis
Country:ChinaCandidate:S J BuFull Text:PDF
GTID:2510306476994199Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
Longitudinal data combines the characteristics of cross-sectional data and time series,it is extremely common in fields such as medicine,biology,and economics.Due to the correlation between multiple repeated observations of the same individual in longitudinal data,how to deal with such intra-individual correlation becomes an unavoidable problem in longitudinal analysis.Furthermore,with the development of science and technology,the emergence of large longitudinal data sets has brought huge difficulties and challenges to the statistical analysis of longitudinal data.The longitudinal full-dimensional folding method proposed in this dissertation can not only maintain the internal structure of the data,but also maintain the regression information of the dependent variable contained in the independent variable,which is a powerful method for supervised dimensionality reduction.This dissertation proposes a model-free sufficient dimensional folding method based on the martingale difference divergence.When the structural dimension is known,it is theoretically proved that the dimensionality reduction criterion can find the central mean dimension folding subspace,and achieve the dimension reduction of two dimensions of time and variable.When the samples are given,the dimensionality reduction method is transformed into a constrained high-dimensional optimization problem,which theoretically proves that the estimation of central mean dimension folding subspace is root-n consistent.When designing the algorithm,by introducing the Kronecker product assumption,the constrained high-dimensional optimization problem is transformed into a low-dimensional optimization problem,so that the mature nonlinear optimization algorithm can be used to solve it quickly.Furthermore,this dissertation proposes a BIC-type criterion,which adaptively determines the structural dimension according to the data,and proves the consistency of the structural dimension determination.In order to investigate the performance of the method proposed in this dissertation under limited samples,this dissertation examines four simulation examples,(1)balanced longitudinal data with continuous response variables;(2)unbalanced belt Longitudinal data of continuous response variables;(3)Longitudinal data with non-smooth link function(4)Longitudinal data with discrete response variables.The simulation results show that,regardless of whether the Kronecker product assumption is satisfied,the proposed method can more accurately estimate the central mean dimensional folding subspace compared with the dimensionality reduction methods in the literature,and at the same time,it has obvious advantages in calculation speed.The determination of the structural dimension has a higher accuracy.Finally,this dissertation uses the data of primary biliary cirrhosis for empirical analysis,and reduces the dimensions of the independent variable and the time point to one dimension.The results show that at the 0.05 significance level,the alkaline phosphatase and prothrombin time are related to serum Bilirubin is significantly positively correlated,and albumin and serum bilirubin are significantly negatively correlated,which is consistent with the conclusions of medical literature.In the four time periods,only the fourth time point has a significant relationship with serum bilirubin,which is in line with the chronic characteristics of the disease.
Keywords/Search Tags:longitudinal data, martingale difference deviation, sufficient dimension reduction, dimension folding
PDF Full Text Request
Related items