Font Size: a A A

A Martingale-difference-based Variable Selection Method For Longitudinal Data

Posted on:2022-09-11Degree:MasterType:Thesis
Country:ChinaCandidate:Q H ChangFull Text:PDF
GTID:2510306476494204Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
Longitudinal data,a special type of data that combines cross-sectional and time series in-formation,often appears in many modern application disciplines,such as biology,medicine,and sociology.These data sets are often large,and the data has an overall characteristics and cross-time correlation.In practical applications,treating the predictor variables of longitudinal data as a matrix can not only retain the original matrix structure of the predictor variables and important explanatory information,but also reduce the number of parameters that need to be estimated and improve the estimation accuracy.In this case,sufficient dimension folding method treats the time vriable as one folding dimension,and the multivariate feature variable is regarded as the other folding dimension.This new longitudinal data processing method models time points and char-acteristic variables jointly,which can not only maintain the matrix value form of the independent variable,but also retain the regression information of the response variable on the independent variable.When the independent variable structure of the longitudinal data is large,it will be difficult to interpret the result of the standard sufficient dimension folding method,because the estimated linear combinations usually consist of all the original predictors.To this end,we draw on the idea of regularization and propose a model-free method for selecting longitudinal data variables based on the martingale difference divergence.This method can eliminate irrelevant and redundant variables while achieving sufficient dimension reduction.Under regular conditions,based on manifold theory and technology,it is proved that our proposed method has the nature of Oracle With respect to the choice of tuning parameters,we propose a BIC-type method to select the tuning parameters adaptively,and prove the consistency of the selection of tuning parametersIn order to evaluate the performance of the sufficient variable selection method proposed in this article under limited samples,we examine two simulation examples:balanced and unbal-anced longitudinal data that satisfy the Kronecker product hypothesis,balanced and unbalanced longitudinal data under the Kronecker product hypothesis.Since this is the first work of model-free variable selection in longitudinal data,we compare its performance with that of sufficient variable selection method in independent data.The simulation results show that the method pro-posed in this article can not only select really effective variables more accurately and stably,but also get the sparse dimension reduction estimates more accurately and stably,which verifies the Oracle nature of the proposed variable selection method numericallyFinally,this article conducts an empirical analysis on the longitudinal data samples with complete observation records in the Mayo Clinic primary biliary cirrhosis data,our method re-duces the dimensions of the independent variable and the time point to 1,and screens out the two independent variables.The sparse estimation result shows that the characteristic variables,alkaline phosphatase and prothrombin time,are positively correlated with serum bilirubin,which is consistent with the conclusions of the relevant literatures.The scatter plots after dimension reduction and variable selection show that our method can achieve sufficient dimension folding and variable selection at the same time,and retain the original information of the data.Further,we carried out subsequent non-parametric modeling based on the sufficient variable selection results of the method proposed in this article and the methods in the simulation comparison,and the boxplots of the mse show that our method can greatly improve the accuracy of the subsequent prediction model.
Keywords/Search Tags:Sufficient variable selection, Longitudinal data, Martingale difference divergenc, Dimension folding, Oracle property
PDF Full Text Request
Related items