Estimation,Clustering And Variable Selection For Heterogeneous Models And Joint Modeling Of Longitudinal And Survival Data

Posted on:2023-10-21

Degree:Doctor

Type:Dissertation

Country:China

Candidate:F F Wang

Full Text:PDF

GTID:1520306905471714

Subject:Probability theory and mathematical statistics

Abstract/Summary:

PDF Full Text Request

In recent decades,with the rapid development of science and technology,the data in the fields of biology,medicine,information technology,finance and marketing have emerged in an endless stream and varied.It is particularly important to accurately analyze important information in such a huge amount of data.Then,the first thing to do is to cluster the data,and model and estimate for the subgroup data after clustering.Secondly,the frequent appearance of high-dimensional data in various fields makes many traditional statistical methods invalid.For such clustering and high-dimensional problems,this paper mainly focus on studying the clustering and estimation of heterogeneous data,and performing variable selection for high-dimensional data in biostatistics.For the clustering analysis of heterogeneous data,we consider the clustering and estimation problem for the partially heterogeneous single-index model and the single-index model with heterogeneous intercepts.For variable selection of high-dimensional data,we consider a joint model for longitudinal data and survival data in biostatistics,and perform estimation and variable selection for this joint model.Based on the above research,this paper is mainly divided into four chapters:Chapter 1 briefly introduces the concepts and definitions of some related models,clustering methods,classical variable selection methods and algorithms involved in this paper.Both Chapters 2 and 3 conduct clustering and estimation for heterogeneous data.Chapter 2 proposes a method of estimation and clustering for partially heterogeneous single-index models,and Chapter 3 proposes a method of estimation and clustering for single-index models with heterogeneous intercepts.Chapter 4 conducts variable selection for each parameter of the joint model of longitudinal data and survival data,especially the selection of longitudinal variables.Next,Chapters 2,3 and 4 are briefly introduced in the following.Chapters 2:For partially heterogeneous single index models,we propose a new estimation and clustering method.Ma and Huang(2017)and Ma et al.(2020)both considered the estimation and clustering problem for heterogeneous models,where Ma and Huang(2017)considered a linear model with heterogeneous intercepts,and Ma et al.(2020)considered a partially heterogeneous linear model.The heterogeneity of both models is reflected in the parametric part,but the heterogeneity for the semiparametric model has not been studied yet.In this chapter,we study the clustering and estimation problem for the partially heterogeneous single index model.It is very difficult to directly solve the objective function for the heterogeneous model.Motivated by Wang et al.(2015),we transform the objective function into a least-squares optimization problem by the characteristics of the index parameters in single index model.Based on this optimization problem,the homogeneous parameter and subgroup-averages of the heterogeneous index directions can be estimated simultaneously.The homogeneous parameter estimator is substituted into the optimization problem,and then we establish a new optimization problem by the concave pairwise fusion penalty method.We solve the new optimization problem by the alternating direction method of multipliers algorithm(ADMM,Boyd et al.,2011).Thus,the subgroup structure of heterogeneous index directions is identified.We prove that the asymptotic normality for the homogeneous parameter estimator and heterogeneous index direction estimators,and the consistency for the identified subgroups under certain conditions.And the homogeneous parameter estimated by the new method is free of the sparsity assumption on the heterogeneous parameters.In addition,the method of Wang et al.(2015)is generalized and applied to the heterogeneous model.The simulation is conducted to illustrate the excellent performance of the new method.Chapters 3:We propose a new estimation and clustering method for the singleindex model with heterogeneous intercepts in this chapter.This heterogeneous model is similar to the partially heterogeneous single-index model proposed in Chapter 2,both of which are semiparametric models.The difference between them is that the heterogeneity is reflected in the intercepts of the heterogeneous model,while the heterogeneity of Chapter 2 is reflected in the semiparametric part.In addition,Chapter 2 requires that the homogeneous parameter and the heterogeneous index direction are both lowdimensional,and does not estimate the unknown link function of single index part.Our proposed method in this chapter not only estimates and clusters the heterogeneous intercepts,but also estimates the unknown link function and index direction of single index part.And the new method does not impose constraints on the dimension of the index parameter.We first use the B-spline method to approximate the single index part in the single index model with heterogeneous intercepts.Based on the approximation of the Bspline method,the objective function established by the concave pairwise fusion penalty method is transformed into a parametric optimization problem.Then we use the ADMM algorithm to solve the optimization problem.Thus,the estimation and clustering in the model is realized.In the iteration of the ADMM algorithm,the Nadaraya-Watson(N-W)method is used to estimate the link function.In addition,a reasonable initial value is very important to start the ADMM algorithm.Here we use an iterative method to estimate the initial value,which is similar to the estimating procedure in the single-index model(Lv et al.2015).The simulation studies examine the excellent performance of the new method for the clustering and estimation of heterogeneous intercepts and the estimation of homogeneous index direction and link function.Chapters 4:In biostatistics,most joint models for longitudinal data and survival time are mixed effect models and Cox proportional hazard models,respectively(Wulfsohn and Tsiatis,1997;Ibrahim et al,2004).Such joint models have been widely studied.However,there has not been much research on the high-dimensional variable selection problem of such joint models.Although He et al(2015)and Chen and Wang(2017)both proposed variable selection methods for the joint models,He et al(2015)only involves one repeated-measures biomarker and survival time,and Chen and Wang(2017)only performed variable selection for random effects and covariance matrices by Lasso penalty.We propose a new estimation and variable selection method for joint models of multivariate longitudinal and survival data.We not only conduct variable selection for the random effects in the joint model,but also for the fixed effects in the joint model.And a Group lasso penalty function is used for random effects,which is different from that in Chen and Wang(2017).This penalty is very meaningful in some areas.For example,a disease has nothing to do with a person’s weight,then the several genes that control weight have no effect on the disease.We can remove the associated genes from models by Group lasso penalty.In this chapter,we establish an objective function for the joint model by the penalized likelihood method.Since the penalized likelihood involves complicated integrals that do not have closed forms,we use a numeric estimation method based on Laplace approximation as in Chen and Wang(2017).Then a fast iterative shrinkage-thresholding algorithm(FISTA)(Beck and Teboulle,2009)is considered to solve the penalized likelihood problem,where we adopt the backtracking line search to calculate the step size.The FISTA,which is viewed as an extension of the classical gradient algorithm,is computationally simple and has a global rate of convergence.The simulation examines the excellent performance of the new method for the estimation and selection in the joint model.To further illustrate the new method,we carry out a detailed analysis for an observational study of primary biliary cirrhosis(PBC)of the liver patients from the Mayo Clinic trial in the application,where our method also has an excellent performance in estimation and selection.

Keywords/Search Tags:

Heterogeneity, single index model, subgroup-average, clustering, het-erogeneous intercepts, B-spline method, ADMM, joint model, high-dimensional variable selection, Group lasso, FISTA

PDF Full Text Request

Related items

1	B-spline Estimation Of Single Index Variable Coefficient Model
2	Variable Selection Of Partial Linear Single Index Model Based On Lasso Method
3	Lasso-type Approach For Variable Selection In Single Index Model
4	Estimation And Application Of Single Index Variable Coefficient Model In Spatial Metrology
5	Variable Selection For Single-index Model With M-estimated
6	Quantiie Regression And Variable Selection Of Semiparametric Regression Models With Index
7	Robust Variable Selection For Constrained High-dimensional Model And Classification Under Distribution Heterogeneity
8	Robust Variable Selection Of Varying Coefficient Models
9	Variable Selection Method For Joint Model Of Longitudinal And Survival Data And Its Application In Clinical Data Analysis
10	Variable Selection Of High-dimensional Mixture Model