Variable Selection For Interval-Censored Failure Time Data And Smoothed Empirical Likelihood For The Optimal Cut Point In ROC Analysis

Posted on:2023-08-31

Degree:Doctor

Type:Dissertation

Country:China

Candidate:R Liu

Full Text:PDF

GTID:1520306806454914

Subject:Probability theory and mathematical statistics

Abstract/Summary:

PDF Full Text Request

Interval-censored data and ROC(receiver operating characteristic)analysis frequently occur in many scientific research fields such as demographic studies,medical studies,sociology etc.(Finkelstein,1986;Hand and Till,2001;Sun,2006).This paper we will introduce three aspects in the survival analysis.By interval-censored data,we mean that the failure time of interest T can not be exactly observed,but only observed to occur in a time interval.Interval-censored data mainly includes case Ⅰinterval-censored data,case Ⅱ interval-censored data and case K interval-censored data.Case Ⅱ interval-censored data means that each individual is observed twice in the experiment,and the event of interest occurs before the first observation,between two observations,or after the second observation.Case K interval-censored data means that each individual is observed K times in the experiment,and the event of interest occurs in a interval.ROC curve is a graphical plot that illustrates the diagnostic ability of a binary classifier system as its discrimination threshold is varied.We will apply the smoothed empirical likelihood approach to optimal cut point,established by criterion Youden index,closest-to-(0,1)index and concordance probability index.Based on optimal cut point,we diagnose whether the disease occurs or not.As mentioned above,several authors have discussed variable selection for intervalcensored failure time data(Li et al.,2020;Zhao et al.,2020).However,existing methods cannot be applied directly to linear models or generalized linear models.In the variable selection procedure proposed below,we will employ the unbiased transformation approach and the main idea behind it is to transfer two variables representing interval-censored data to a new,single variable that has the same conditional expectation.Consider a failure time study that consists of n independent subjects and for subject let Ti denote the failure time of interest and Xi a p-dimensional vector of covariates.Suppose that for each Ti,two observations are available at the observation times Ui and Vi such that they divide the axis(0,∞)into three parts[0,Ui),[Ui,Vi],(Vi,∞)and we know which part Ti falls in.Thus the observed data on subject i have the form Oi={Ui,Vi,δ1i,δ2i,Xi},where δ1i=I(Ti≤Ui)and δ2i=I(Ui<Ti≤Vi)with I(·)being the indicator function,i=1,…,n.In the following,we will assume that Ti is independent of Ui and Vi given Xi.To describe the covariate effects,we will assume that given Ti has the form H(Ti)=β0+XiTβ+εi,(1)where H(·)is a known function, is a zero mean random error with distribution unknown,and β0 and β are unknown parameters.Note that for estimation of the model above,if Ti was exactly observed,a simple method would be to take H(Ti)as a new response variable and transform model(1)into a linear model.For the situation here where Ti is interval-censored,however,the above method cannot be carried out directly.To overcome the problem,we will adopt the unbiased transformation approach to first convert H(Ti)into the variable hi*=φ1(Ui,Vi)δ1i+φ2(Ui,Vi)δ2i+φ3(Ui+Vi)(1-δ1i-δ2i)+H(0),where φ1(·,·),φ2(·,·)and φ3(·,·)are some continuous functions with finite continuous partial derivatives and also independent of the distribution of Ti.One can see that,φ1(Ui,Vi),φ2(Ui,Vi)and φ3(Ui,Vi)not only depend on Ui and Vi but also depend on g(Ui,Vi).More specifically,one can rewrite hi*as hi*(Ui,Vi,g(Ui,Vi),δ1i,δ2i)=φ1(Ui,Vi,g(Ui,Vi))δ1i+φ2(Ui,Vi,g(Ui,Vi))δ2i+φ3(Ui,Vi,g(Ui,Vi))(1-δ1i-δ2i)+ H(0).Thus for estimation of model(1)or β,it is natural to consider the least squared method to minimize mean square of residual after unbiased transformation if g(Ui,Vi)was known.Of course,in practice,g(Ui,Vi)is unknown and for this,we propose to first estimate g(Ui,Vi)by the kernel density estimator g(Ui,Vi)and then consider hi*-φ1(Ui,Vi,g(Ui,Vi))δ1i+φ2(Ui,Vi,g(Ui,Vi))δ2i+φ3(Ui,Vi,g(Ui,Vi)(1-δ1i-δ2i)+ H(0).Note that hi*given above involves the estimation of the two-dimensional function g and according to Section 5 of Deng(2004),one could equivalently replace it by hi*=φ1(Ui,Vi,gu(Ui),gv(Vi))δ1i+φ2(Ui,Vi,gu(Ui),gv(Vi))δ2i+φ3(Ui,Vi,gu(Ui),gv(Vi))(1-δ1i-δ2i)+H(0)。where gu(Ui)and gv(Vi)denote the kernel estimators of the marginal density functions gu(Ui)and gv(Vi).DefineThen at the kth iteration,value of βj is deriving by minimizes Q(βj)=M(βj)+pλ(|βj|)to determine βj(k).Note that by drawing the locally quadratic approximation idea discussed in Fan and Li(2001),locally quadratic approximation can be presented as|pλ(|βj|)]’=p’λ(|βj|)sgn(βj)≈{p’λ(|βj(k-1)|)/|βj(k-1)|}βj when βj≠0.In other words,A quadratic function can locally approximated p(|βj|;λ)at |βj(k-1)|as|pλ(|βj|)≈pλ(|βj(k-1)|)+1/2{p’λ(|βj(k-1)|)/|βj(k-1)|}[βj2-(βj(k-1)2].Meanwhile,M(βj)can be obtained by the second-order Taylor expansion,where M’ and M" represent first and second derivatives of M(·),respectively.Therefore,the minimizing of is equivalent to minimizing the function with respect to βj.A closed form solution is given asNote that it is easy to see that the resulting solution(3)and the approximation used above for the penalty function apply to any penalty function.However,BAR penalty is not necessary to do locally quadratic approximation,due to the fact that it is already a quadratic function of coefficients.For the situation,we can obtain the close form iterative solution proposed by(Wu et al.2020)as where Q’(βj(k-1))and Q"(βj(k-1))are the first and second derivatives of Q(βj)with respect to βj evaluated at βj(k-1),respectively.Theorem 1.Assume that Conditions C1-C8 given in §2.3 hold and φ1(·,·),φ2(·,·)and φ3(·,·)are continuous functions with finite continuous partial derivatives.Then we have that(ⅰ)The fixed point of f(α)={Xα(?)Xα+λnD1(α)}-1Xα(?)Y*exists and is unique,where D1(α)=diag(α1-2,…,αqn-2).(ⅱ)(oracle property)BAR estimator β*=(β1*(?),β2*(?))(?)exists and is unique,whereβ2*=0 and β1*is the unique fixed point of f(α).(ⅲ)(Asymptotic normality)(?)(β1*-β01)(?)N(0,Σ(1)),where Σ(1)is defined in the §2.6.As mentioned above,a great deal of literature has been established for variable selection under the context of univariate right-censored data,where the presence of censoring makes the problem much more difficult and challenging than for complete data.Several methods are also available for variable selection based on bivariate or multivariate right-censored failure time data(Cai et al.,2005;Zhang and Lu,2007;Liu et al.,2015).For example,Cai et al.(2005)proposed a penalized pseudo-partial likelihood method.Consider a failure time study that consists of n independent subjects and in which each subject can experience two related failure events of interest.For subject i,let Tij denote the jth failure time of interest and Xi a pn-dimensional vector of covariates,j=1,2,i=1,…,n.For the covariate effects,we will assume that given Xi and a latent variable ηi with mean zero,the cumulative hazard function of Tij has the form Here Gj denotes a pre-specified increasing function with Gj(0)=0,Λ0j(t)an unknown increasing,nonnegative function,and β is a pn-dimensional vector of regression parameters.Note that in the model above,for simplicity,we assume that the covariates that affect two failure times of interest are same and also the covariates have the same effects on the two failure times.It is straightforward to generalize the presented method to the situation where they may different and some comments on these will be given below.In the following,it will be supposed that the main focus is on the determination of important or relevant covariates or predictors as well as estimation of their effects.The model(5)without the latent variable is commonly referred to as the semiparametric transformation model(Chen et al.,2002;Chen et al.,2012),and as mentioned above,it has the advantage of flexibility since it includes many commonly used models as special cases.For example,it gives the proportional hazards model if setting Gj(x)=x,and one obtains the proportional odds models by taking Gj(x)=log(1+x).It is easy to see that the latent variable ηi characterizes the correlation between the two failure times of interest,and in the following,we will assume that given Xi and ηi,Ti1 and Ti2 are independent.On the observed data,suppose that one only observes an interval(Lij,Rij]such that Tij∈(Lij,Rij],j=1,2,i=1,…,n.That is,only bivariate interval-censored data are available.Let θ=(βT,γ)T and A0=(A01,A02)and assume that the censoring is independent(Sun,2006).Then the observed likelihood has the form where g(ηi|γ)denotes the density function of the ηi’s which is assumed to be known up to the parameter γ.For the determination of selection of important covariates,we propose to adopt the penalized likelihood approach and to maximize the penalized log likelihood function where pλ(|βk|)denotes a penalty function with λ being a tuning parameter.For the maximization of Lpn(θ,Λ0),it is apparent that the resulting estimators have no closed forms and some iterations are needed.Actually this is true even without the penalty function(Sun and Ding,2019;Zhou et al.,2017).Before presenting the developed EM algorithm with the use of the Poisson variable-based data augmentation,we will first establish the oracle property of the approach under the BAR penalty function pα,λ(|βk|)=λβk2/βk2,where βk denotes a nonzero initial estimator of βk.For inference about β01,it is apparent that one needs to estimate the covariance matrix of β1.For this,one can see from Chapter 3 that it would be difficult to derive a consistent estimator of Σ and thus we suggest to employ the nonparametric bootstrap procedure.Now we discuss the maximization of the penalized log likelihood function log Lpn(θ,Λ0),For this,first note that the transformation function Gj can be expressed or derived as with respect to a density function fj(t|rj).If fj(t|rj)is the density function of the gamma variable with mean 1 and variance rj,we have that Gj(x)=log(1+rjx)/rj,the logarithmic transformation function family,which will give the proportional hazards or proportional odds models with rj=0 or rj=1,respectively.It follows that the likelihood function given in(2.2)can be rewritten as where △ij=I(Rij<∞).For each j,let 0=tj0<tj1<…<tjmj<∞ denote the distinct and ordered sequence of all Lij and Rij<∞.With respect to estimation of Λ0j,we will adopt the nonparametric approach and treat it as a step function with the jumps cjl’s only at the tjl’s.Then we can rewrite(9)where Λ0=(c11,…,c1m1,c21,…,C2m2).To develop the EM algorithm,note that if the latent variables ηi’s and ξij’s were known,we would have the likelihood functionTo further augment the observed data,define a mapping between △ij and a new latent variable Zij by △ij=I(Zij>0),where Zij=(?)Zijl with Zijl being the independent Poisson random variables with the mean ξijcjlexp(Xi(?)β+ηi)(i=1,…,n;j=1,2;l=1,…,mj).Let p(Zijl|ξijcjlexp(Xi(?)β+ηi))denote the probability mass function of Zijl.Then if the Zijl’s were known,we would have the pseudo-complete data likelihood function where Σtjl≤LijZijl=0 and ∑Lij<tjl≤RijZijl>0 if △ij=1;∑tjl<Lij Zijl=0 if △ij=0.In the E-step of the algorithm,we need to determine the conditional expectations E(Zijl)and E(ξij exp(ηi))given the observed data and the estimates θ(m)and λ0(m)at the mth iteration in order to calculate Note that here for the notational simplicity,we ignore the conditional arguments in all conditional expectations above.In the M-step of the algorithm,by setting ?Q(θ,Λ0,θ(m),Λ0(m)/?cjl=0,we can update cjl with the closed-form expression for j=1,2,and l=1,…,mj.To obtain the sparse estimator of β,one needs to minimize the objective function where For this,we will employ different algorithms for different penalty functions.In addition to the BAR penalty function,we will also consider the least absolute shrinkage and selection operator(LASSO),the adaptive LASSO(ALASSO),and the smoothly clipped absolute deviation(SCAD)penalty functions.The LASSO and ALASSO penalty functions have the forms pλ(|βk|)=λ|βk| and pλ(|βk|)=λ|βk|/βk with βk being a consistent estimator of the regression parameter.The derivative of the SCAD penalty function has the form where p’λ(x)=dpλ(x)/dx.Note that here we have another tuning parameter a and by following Fan and Li(2001),we will set set a=3.7.To minimize(14)with the use of the LASSO or ALASSO penalty function,we propose to employ the modified shooting algorithm discussed in Zhang and Lu(2007).Define the gradient vector ▽H(θ,A0,θ(m),Λ0(m))=?H(β,θ)/?β and the Hessian matrix By using the Cholesky decomposition,we can write and set the pseudo response vector to be It follows that the second order Taylor expansion of H(θ,Λ0,θ(m),Λ0(m))yields and the objective function can be written as where Yi and Xi denote the ith components of Y and X,respectively.Note that with the LASSO and ALASSO penalty functions,we have that pλ(|βk|)=λk|βk| with λjk=λ for the LASSO penalty and λk=λ/|βk| for the ALASSO penalty.Define F(β)=2-1(?)(Yi-Xi(?)β)2 and Fk(βk)=?F(β)/?βk,and let β-k denote the pn-dimensional vector β with its kth component replaced by 0.At the(m+1)th iteration,given the current estimate β(m)-k,we have where G0k=Fk(β(m)-k)and Xk=(X1k,…,Xpk).With the use of the SCAD penalty function,note that it is singular at the origin and does not have the continuous second order derivative.On the other hand,it can be locally approximated by a quadratic function as for βk≠0.By following Fan and Li(2001),and we can determine the solution by iteratively computing the ridge regressionβ(m+1)={X(?)X+∑λ(β)}-1X(?)Y.(17)In the above,∑λ(β)=diag{p’λ(|β1|)/|β1|…,p’λ(|βpn|)/|βpn|},and in practice,one can simply set β=β(m).When the BAR penalty function is applied,the objective function has the form In this case,one can easily derive the closed form solution where D(β)=diag(β1-2,…,βpn-2)and in practice,again one can simply set β=β(m).Note that sometimes the above iteration may have an arithmetic overflow issue and alternatively,one could use(Dai et al.,2018),where X(βm)=XΓ(βm),Γ(βm)=diag(β1m,…,βpnm).Let θ=(β(?),γ)(?)and Λ0=(Λ01,Λ02)denote the estimators of θ and Λ0 defined above.Also let β0=(β0,1,…,β0,pn)(?)denote the true value of β and without loss of generality,assume that we can write β0=(β01(?),β02(?))T,where β01 consists of all q nonzero components and β2 the remaining zero components.We will divide the BAR estimator β=(β1(?),β2(?))(?)corespondingly.The theorem below gives the oracle property of the estimator β.Theorem 2.Assume that the regularity conditions given in §3.2 and Zeng et al.(2017)hold.Then the BAR estimator β=(β1,β2)has the following properties.(1)β2(?)0.(2)β1 exists and(?)(β1-β01)(?)Nq(0,Σ)with Σ defined in ξ3.6.At last,A variety of methods have been developed for the inference of Youden index and its associated optimal cut point,mainly based on parametric assumption or bootstrap methods.Schisterman and Perkins(2007)developed a delta method to estimate the confidence interval of the Youden index and its corresponding optimal cut point under assumptions of normal and gamma distributions.Lai,Tian and Schisterman(2012)proposed a parametric generalized pivotal quantities approach,and Zhou and Qin(2012)developed an adjusted bootstrap approach for Youden index.Wang,Tian and Zhao(2017)first proposed a smoothed empirical likelihood approach for the confidence interval of the Youden index and demonstrated its computational efficiency and better coverage probabilities compared to the parametric methods and bootstrap based procedures.However,transformation and computing algorithm are needed to perform statistical inference for the optimal cut point associated with the index,which may involve computational challenges.Suppose that X1 and X2 are the measures of a diagnostic biomarker from the diseased(case)and non-diseased(control)populations,respectively.Let Fi(·)be the cumulative probability distribution function of Xi,i=1,2.Without loss of generality,it is assumed that the larger the value of the biomarker,the higher probability of the disease,i.e.,X2 is stochastically less than X1(X2(?)X1).At optimal cut point τ,the sensitivity of the biomarker is 1-F1(τ)and the specificity of the biomarker is F2(τ).At the cut point t,(1)the Youden index is defined as J(t)=F2(t)-F1(t);(2)the closest-to-(0,1)index is ER(t)=(?);(3)the concordance probability index is CZ(t)=F2(t)(1-F1(t)).In general,the optimal cut point depends on the index.Specifically,the optimal cut point based on the Youden index is obtained by maximizing J(t),the optimal cut point based on the closest to(0,1)is obtained by minimizing the ER(t),and the optimal cut point based on the concordance probability index is obtained by maximizing CZ(t).Let X11,X12,…,X1n1 be n1 independent and identically distributed diagnostic biomarker measurements from the diseased group and X21,X22,…X2n2 be n2 independent and identically distributed diagnostic biomarker measurements from the nondiseased group.Let pj1,…,pjnj be the probability vectors satisfying(?)pji=1,and pji≥ 0 for i=1,…,nj,j=1,2.Let F1 and F2 be the empirical distribution functions which assign probability pji at the ith observation Xji for j=1,2,respectively.Let k(t)be a known kernel function defined by a probability density function symmetric about 0.Let kh(t)=h-1k(t/h)and Kh(t)=∫u≤tkh(u)du,where h>0 is a bandwidth.Then the kernel smoothed estimators of the probability density function and distribution function with probability mass pji(i=1,…,nj,j=1,2)are Let θ=J(τ),where τ=arg maxt J(t).Let(τ0,θ0)be the true optimal cut point and its corresponding Youden index.Note that the maximizer of the J(t)is the maximizer of F2(t)-F1(t).Therefore,τ is the solution to The empirical likelihood at(τ,θ)is subject to the constraints Set f1(τ)=f2(τ)=η1,F1(τ)=η2 and F2(τ)=η2+θ.Define where ξ=(η1,η2,θ)T.Then the corresponding smoothed log-transformed empirical likelihood ratio at(τ,ξ)for Youden index is By standard Lagrange multipliers,we get After taking derivative with regard to p11,we haveHence,p11=1/n1(λ11/w11i+λ12w12i+1)-1 by setting the above equation to zero.Similar steps can be performed to all pji,i=1,…n1 for j=1 and i=1,…n2 for j=2.Then it is easy to see that the smoothed empirical log-likelihood ratio evaluated at(τ,ξ)can be rewritten as and(λ11,λ12,λ21,λ22,η1,η2,θ)are the solutions to the equations for j=1,2 and k=1,2.At the true of the parameter τ0,let where ξ is calculated by equations(4.2.6).The lJ*(τ0)is the log-transformed smoothed empirical likelihood ratio evaluated at the true of the parameter τ0.Let 0=ER(τ),where τ=argmintER(t).Let(τ0,θ0)be the true optimal cut point and its corresponding ER index.Note that the minimizer of the ER(t)is the minimizer of(F1(t))2+(1-F2(t))2.Therefore,τ is the solution to Define η1=F1(τ)f1(τ)=(1-F2(τ))f2(τ),η2=F2(τ),then f2(τ)=η1/(1-η2)and F1(τ)=(?)With the similar steps in the previous section,the corresponding smoothed logtransformed empirical likelihood ratio for ER index is where and ξ=(η1,η2,θ).By standard Lagrange multipliers,the smoothed empirical loglikelihood ratio evaluated at(τ,ξ)can be rewritten as for j = 1,2 and K= 1,2.At the true of the parameter T_O,the log-transformed smoothed empirical likelihood ratio is(?)where (?) is calculated by equations(29).The concordance probability method proposed by Liu(2012)defines the opti?mal cut point as the point maximizing the product of sensitivity and specificity,i.e..(?) = maX,CZ(t).As Liu(2012)pointed out,the concordance probability CZ(t)can be interpreted as the area of a rectangle associated with the ROC curve at the point t.The cut point r that maximizes CZ(t)actually maximizes the area of the rectangle and r is the solution to(?).The smoothed version of the log-transformed empirical likelihood ratio for CZ index is(?)where (?)and (?) By standard Lagrange multipliers,the smoothed empirical loglikelihood ratio,evaluated at(?),can be rewritten as(?)and(λ11,λ12,λ21,λ22,η1,η2,θ)are the solutions to the equations and(λ11,λ12,λ21,λ22,η1,η2,ξ)are solutions to the equations for j=1,2 and k=1,2.The log-transformed smoothed empirical likelihood ratio evaluated at true parameter value τ0 is where ξ is calculated by equations(32).Theorem 3.Under the regularity conditions(i-iv),we have...

Keywords/Search Tags:

generalized linear model, unbiased transformation, transformation linear model, EM algorithm, variable selection, empirical likelihood, ROC curves

PDF Full Text Request

Related items

1	Empirical Likelihood Inference For Right Censored Data And Recurrent Event Data
2	Bayesian Empirical Likelihood Statistcal Inference Regression Model With Censored Data
3	Empirical Likelihood And Penalized Empirical Likelihood Inferences For Some Semiparametric Regression Models With Censored Data
4	Penalized Empirical Likelihood Of Partial Linear Models And Generalized Linear Models
5	The Partial Linear Variable Coefficient Model Based On Empirical Likelihood Of Statistical Diagnosis
6	Penalized High Dimensional Empirical Likelihood For Generalized Linear Models
7	Part Of The Empirical Likelihood Under The Lack Of Response Variable Linear Ev Model In Statistical Inference
8	A Study Of Computation And The Best Linear Unbiased Estimation Under The Singular Linear Model
9	The Empirical Likelihood Of A Semi-parametric Variable Coefficient Partial Linear EV Model
10	Empirical Likelihood Inference Method For Semi-parametric Varying-coefficient Partial Linear Model