Font Size: a A A

Semiparametric Analysis Of Informatively Interval-censored Failure Time Data

Posted on:2019-02-04Degree:DoctorType:Dissertation
Country:ChinaCandidate:S Y WangFull Text:PDF
GTID:1360330572952961Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
In recent years,regression analysis of interval-censored failure time data has at-tracted much extensive attention and many models and estimating methods have been proposed.Among those,semiparametric model involving of parametric and nonpara-metric part is especially concerned by scholars.Interval-censored failure time data occur in many scientific areas such as demographical,financial and medical studies among others(Sun,2006).By interval-censored data,we mean that the exact occur-rence time of the failure event of interest is not observed and instead is known only to belong to a window or an interval(L,R).In general,there are mainly two types of interval-censored data,case I interval-censored data and case II interval-censored data.By case ? interval-censored data,the observation on each individual failure time is either left-or right-censored,that is,either(L = 0)or(R = ?)(Groeneboom and Wellner,1992;Huang,1996).In other words,each study subject is observed only once and the only observed information for the event of interest is whether the event has occurred no later than the observation time.Case ? interval-censored data are also often referred to as current status data(Rossini and Tsiais,1996;Martinussen and Scheike,2002).Case ? interval-censored data mean that the failure time of interest occurs in some finite interval(Huang and Wellner,1997;Sun,1998,2005).Case K interval-censored data can be seen as one formulation of case ? interval-censored data,which there exists a set of observation time points and the true failure time of interest falls into some observation interval and the specific form of the data is given in the first chapter.This censored-data are the focus in our paper.This thesis will discuss three issues related to semiparametric regression analysis of case K interval-censored failure time data.Firstly,we have discussed the inference about the additive hazards model based on case K interval-censored failure time data.Regression analysis of failure time data has been considered by many authors and for this,one of the commonly used models is the additive hazards model(Lin and Ying,1994).In the past,many papers have been developed for the situation where the censoring mechanism is independent of the failure time of interest(Chen et al.,2013;Huang,1996;Sun,2006).But,in some real situations,the assumption may not hold,that is,the censoring mechanism is depen-dent or informative.Some approaches also have been proposed for the case where the censoring mechanism may be related to the failure time of interest(Ma et al.,2015;Wang et al.,2016;Zhang et al.,2005,2007).For case K interval-censored data,we consider the failure time and observation process are dependent or informative,that is,case K informatively interval-censored failure time data.To deal with informative censoring or model the relationship between the failure variable of interest and cen-soring variables,two commonly used methods are the copula model approach and the frailty model approach.In the following,we present a frailty model-based inference procedure.To introduce the form of case K interval-censored failure time data,consider a failure time study that involves n independent subjects and let Ti denote the failure time of interest for subject i.Also for subject i,suppose that there exists a p-dimensional vector of covariates denoted by xi and there exists a sequence of observation times Ui0 = 0<Ui1<Ui2<...<UiKi,where Ki denotes the number of observations on the subject.Define Ni(t)=?j=1KiI(Uij?t)and ?ij =I(Uij-1<Ti?Uij),i=1,...n,j = 1,...,Ki.Then Ni(t)denotes the total number of observation times up to time t for the ith subject and jumps only at each observation time,and case K interval-censored failure time data have the form O={Oi=(?i,Uij,?ij,xi,j=1,...,Ki),i=1,...,n}In the above,?i,denotes the follow-up time on the ith subject that will be assumed to be independent of Ti.To describe the relationship between the failure time of interest and the censoring mechanism,we assume that there exists a latent variable bi and given xi and bi,Ti follows the additive hazards frailty model?i(l|xi,bi)=?0(;)+xiT?1+bi?2,(1)where ?0(t)denotes an unknown baseline hazard function and ?1 and ?2 are unknown regression parameters.Furthermore it will be assumed that given xi and bi,Ni(t)is a nonhomogeneous Poisson process with the intensity function?ih(t|xi,bi)= ?0h(t)exp(xiT? + bi),(2)where ?0h(t)is an unknown continuous baseline intensity function and a a vector of regression parameters as ?1 and ?2.It is apparent that the parameter ?2 represents the extent of the association between the failure time and the observation process.The two will be independent if ?2 = 0.Define ?=(?1T,?2)T and ?0(t)=?0t ?0(s)ds.For inference about models(1)and(2),if the distribution of the bi's is known,it is apparent that one could employ the observed likelihood function that would involve their distribution andthe conditional likelihood function given the Uij's and bi's?where Si(t)=exp(-?0(t)-(xiT?1+bi?2)t).On the other hand,the likelihood would involve some difficult integra-tions and also the distribution of the bi's is usually unknown.To avoid these issues,we present a two-step estimation procedure that can be easily implemented by considering Huang and Wang(2004)and Wang et al.(2016).The main idea behind the two-step procedure is to first estimate unknown parts in model(2)and then unknown part in model(1)with the use of the sieve maximumlikelihood approach.In the following,we will assume that ?0h(?0)=1,where ?0h(t)=?0t ?0h(s)da and ?0 denotes the longest follow-up time.To estimate model(2),note that under the assumptions and given xi and bi,the number of observation times Ki follows the Poisson distribution with mean?h(?i|xi,bi)=?0h(?i)exp(xiT?+bi).Also note the estimating method and results of Wang et al.(2001),one can estimate?0h(t)by the nonparametric maximum likelihood estimator In the above,the s(l)'s are the ordered and distinct values of observation times {Uij},d(l)is the number of the observation times equal to s(l),and R(l)the total number of observation events with observation times and observation terminating time satisfying Uij?s(l)??i.For estimation of regression parameter ?,we can define a class of estimating equations aswhere xiT =(1,xiT)and the wi's are some weights that could depend on the xi's,?i's and ?0h.Let ? denote the estimator of ?.Then one can estimate or replace bi by bi=log{ Ki/?0h(?i)exp(xiT?)}for estimation of the regression parameters ?1 and ?2.For inference about model(1),note that if the bi's were known,the model becomes the usual additive hazards model and one can base the inference on the likelihood func-tion L(?,?0|bi's).Hence for the estimation,it is natural to maximize the estimated or working likelihood function L(?,?0|bi's).Also note that L(?,?0|bi's)involves the infi-nite dimensional unknown function ?0(t),which can make the maximization difficult.To address this,we employ the sieve approach to approximate ?0(t)first.The specific method is given in Chapter 2.Note that the sieve approach has often been employed to simply the estimation problem that involves an unknown function and has been shown to be effective under various contexts including the frailty model framework(Huang and Rossini,1997).Define ?T =(?T,?T)and let yi=(xiT,bi)T and yi =(xiT,bi)T.Then we can define the sieve maximum likelihood estimators of ? and ?0(t)or ?T,denot-ed by ? =(?T,?T)T,as their values that maximize l(?,?|bi's)= l(?,?n(t)|bi's)=log L(?,?n(t)|bi's)=?i=1nl(i)(?,?n|bi's)over the sieve space ?×?qn,where ? is a bounded subset of Rp+1.Given qn and the tl's we need to solve the following working score equationsl?(?,?n|bi's)=0andi?l(?,?n|bi's)=0.For the implementation of the estimation approach described above,many existing optimization methods can be used,including the Nelder-Mead simplex algorithm and the Newton-Raphson method.We make use of the unconstrained nonlinear optimization tool nlm in R.For inference about ?0,it is obvious that we also need to estimate the covariance matrix of ?.We suggest to apply the simple bootstrap procedure by following others(Efron,1979;He et al.,2009;Huang et al.,2010).More specifically,let B be a pre-specified positive integer.For each b = 1,...,B,draw a random sample O(b)={ Oi(b);i=1,...,n } of size n with replacement from the observed data O and let?(b)denote the proposed estimator of ? based on the bootstrap data set O(b).Then a natural estimator of the covariance matrix of ? is given bySecondly,we discussed semiparametric analysis of case K interval-censored failuretime data in the presence of informative censoring.Comparing with our first study,instead of employing the two-step estimating procedure,we consider a full likelihood-based estimating method.Through using a shared frailty term,a joint model is built to describe the correlation between the failure time and the observation process.For the informative censoring,there are many studies(Ma et al.,2015;Wang et al.,2016).To deal with informative censoring,two commonly used methods are the copula model approach(Zhao et al.,2015;Ma et al.,2015)and the latent or frailty model approach(Zhang et al.,2005,2007;Li et al.,2017;Liu et al.,2016).In the following,we will present a frailty model-based inference procedure.Consider a failure time study that consists of n independent subjects.Following the notations in the first problem,we also assume that there exits a latent variable b which is the link between the failure time of interest and observation process.Thus,for a full random sample of n subjects,(Ni(·),xi,?i,Uij,?ji,bi,j=1,...,Ki),i=1,2,...,n.will be the independent and identically distributed random objects.Note that the observed data are O = Oi=(?i,Uij,?ij,j=1,...,Ki),i= 1,2,...,n },where ?i denotes a follow-up time on the ith subjects that is assumed to be independent of Ti,?0 denotes the longest follow-up time.We then make the following model assumptions:(A1)For subject i,there exists a latent variable bi and given xi and bi,the observationprocess Ni(l)is a nonhomogeneous Poisson process with the intensity function?ih(l|xi,bi)= ?0h(l)exp(xiT?+bi),wherea is a p×1 vector of regression parameters and ?0h(l)denotes a completelyunknown continuous baseline intensity function with ?0h(t)=?0t?0h(s)ds.The latent variable bi is independent of covariate Xi,(A2)Given xi and bi,Ti follows the following additive hazards frailty model?i(t|xi,bi)= ?0(t)+ xiT?1+bi?2,where ?0(t)denotes a completely unknown baseline hazard function ?0(t)=?0t?0h(s)dsand ?1 and ?2 are unknown regression parameters.(A3)Failure time Ti and observation process Ni(·)are assumed to be independent conditional on the frailty term bi and covariates xi.(A4)Assume that bi's are independent identically distributed normal random variables with zero mean and unknown variance ?2.Denote ? =(?1T,?2,?T,?2,?0(·),?0h(·))the unknown parameters and f(bi)the density function of the frailty,the full likelihood function for O = {Oi,i=1,...,n} is where S(l)=exp{-?t(l)-l(xiT?1+bi?)},?i=(?i1,...,?iKi),U,=(Ui1,...,UiKi).L?i|Ui,Ni(?i)=Ki,bi(?),Lui,Ni(?i)=Ki|bi(?),f(bi;?)will be given in Chapter 3.Next it is natural to maximize the observed data-based likelihood function LO(?).But it seems not easy to do directly due to the unknown functions ?0(·)and ?0h(·).To address this,by following Huang and Rossini(1997),we use the ?n(·)and Anh(·)to approximate ?0(·)and ?0h(·),where a1,a2 denote the lower and upper bounds of the observation time,?l's and ?l's are unknown parameters to be estimated.In addition,Bl=(t,m,a1,a2)=Cml(g-a1/a2-a1)l(1-t-a1/a2-a1)m-1,where m denotes the degree of the Bernstein polynomials which is usually chose as m = o(nv)for some 0<v<1/2,Mn = 0(na1),0<a1<1/2.After the above approximation,we will present an EM algorithm to estimate parameters of interest.Define the complete data to be {(Oi,bi),i=1,...,n}.Let b =(b1,...,bn)T,the complete data likelihood function isThen,compute the log-expectation of(4)at the(k+1)th iteration conditional on the observed data and the current estimate,that is,Q(?|O,?(k))=E[lC(?;O,b)|O,?(k)]-E[1/2log2?+log?+bi2/2?2|O'is,?(k)]}.(5)Among the calculation of the above conditional expectation,we need to evaluate the integral of the following form E{g(bi)|O'is,?(k)]=?g(bi)f(biOi,?(k))dbi,(6)where g(bi)is a function of bi and f(bi)f(biOi,?(k))is the probability density function of bi conditional on the observed data and the kth iteration estimate of ?.Here,since the integral(6)does iot have a closed form,we can employ the Monte Carlo method.Maximizing the conditional expectation(5)with respect to ? at the(k + 1)th iteration to obtain the score functions S?1(?1),S?2(?1),S?l(?1),S?(?2),S?l(?2),the specific form of score functions are given in Chapter 3.And let the score functions to be zero to obtain the update estimators at the(k + 1)th iteration.In summary,by combining all preceding steps,we suggest the following algorithm.Step 1.Choose m and initial values for all parameters,that is,?(0).Step 2.At the(k + 1)th iteration step,calculate the conditional expectations Ei{?i1},Ei{?i2},Ei{?i3},Ei{?i3},Ei(bi),Ei[ebi],Ei(bi2)at ? = ?(k).Step 3.Obtain the updated estimators ?1(k+1)and ?2(k+1)by solving the equations S?1(?1)= 0 and S?2(?i)= 0 with ?l = ?l(k),l=0,1,...,m.Step 4.Update the estimators ?l(k+1)'s by solving the equations S?l(?1)= 0 with?1=?1(k+1)and ?2 =?2(k+1).Step 5.Obtain the updated estimators ?(k+1)by solving the equations S?(?2)=0 with ?l=?l(k),l=0,1,...,m.Step 6.Update the estimators ?l(k+1)'s by solving the equations S?l(?2)=0 with? =?(k+1).Step 7.Determine the estimator ?2(k+1)by the specific expressions.Step 8.Repeat Step 2-7 until the convergence.In the following,under some regularity conditions,the theoretical results of es-timator ? are summarized in the following theorems and the limits are taken under n ??.Theorem 1 Under Conditions(C3.1)-(C3.5)given in Chapter 3,?1,?2,?,?2 are strong consistent estimators of ?10,?20,?0,?02 respectively,and furthermore,we have??n-?0?2?0,??Anh-?0h?2?0 almost surely.Theorem 2 Under Condition(C3.1)-(C3.5)given in Chapter 3,d(?,?0)= Op(n-(1-v)/2 + n-rv/2).Moreover,d(?,?0)= Op(n-r/(2+2r))if v= 1/(1 +r).Theorem 3 Suppose Condition(C3.1)-(C3.6)given in Chapter 3 hold,then n1/2((?1-?10)T,(?2-?20),(?-?0)T,(?2-?02))T ?N(0,?)in distribution,and(?1T,?2,?T,?2)T is semiparametrically efficient.where for estimation of the standard errors of the above parameter estimators,we employ the simple bootstrap procedure by following others(Efron,1979;He et al.,2009;Huang et al.,2010).Finally,we will discuss semiparametric regression analysis of a general interval-censored failure time data where there exists a sequence of observation times in the presence of dependent censoring.For most of the statistical methods,a common as-sumption is that all subjects will experience or are susceptible to the failure event of interest.In reality,due to advances in modern medical techniques and health care,survival rates have been improved.There may exist some situations where the study population consists of both susceptible subpopulation and cured subpopulation who are not at risk to experience the event of interest and only individuals in the suscepti-ble subpopulation will experience the failure event of interest.Two major methods to model the cure rate are the mixture cure model and the non-mixture cure model.For the non-mixture cure model,there are many researches(Tsodikov,1998;Tsodikov et al.,2003;Zeng et al.,2006;Liu and Shen,2009;Hu and Xiang,2013).The mixture cure model has been investigated extensively in the literature(Berkson and Gage,1952;Farewell,1982;Kuk and Chen,1992;Lam and Xue,2005;Mao and Wang,2010).It is a mixture of two separate regression model that depends on different parameters,allowing for separate covariate interpretations for the cure function and for the survival function for the non-cured population.The following research is the semiparametric analysis which is based on mixture cure model with informatively interval censoring.Consider a failure time study that consists of n independent subjects and in which there may exist a cure subgroup.Let T denote the failure time of interest and there exists a vector of covariates denoted by X ? Rp.Under the mixture cure modeling approach(Farewell,1982),a decomposition of the failure time is given by T = YT*+(1-Y)?,where Y is the cure indicator variable,takes the value 1 if the study subject is suscep-tible and Y = 0 if the subject is cured and nonsusceptible.T*<? denotes the failure time of interest of a susceptible subject.Suppose there exists a vector of covariate Z ?Rq may have some effects on Y also,we use a logistic model for the cure indicator?(Z)=P(Y=1/Z)=exp(?TZ)/1+exp(?Tz),(7)Here ? is an unknown regression parameter vector of q-dimension and covariates Z may be the same as,a part of or totally different from X.We assume that the failure time T may not be observed exactly instead that we have a sequence of observation time points denoted by Ui0= 0<Ui1<Ui2<...UiKi and ?ij=I(Uij-1<Ti?Uij),i =1,...,n,j = 1,...,Ki.That is,the failure time of interest only belongs to a time interval and Ki denotes the total number of observation events to occur.Here we introduce a observation process N(t),let N(t)=?0tdN(u)be the number of observation times in(0,t]for subject,where dN(t)= N(t + dt)-N(t)denotes the number of observation times in the small time interval(t,t + dt].For a follow-up time ?,N(?)= K.Thus,the observed data are{Oi =(Xi,Zi,?i,Uij,?ij,Ki,j = 1,...,Ki),,i = 1,2,...,n }.where ?i denotes a follow-up time for the ith subject that is assumed to be independent of Ti.That is,we only have case K interval-censored data.In fact,the failure time of interest may be related with the observation process.Similarly,to describe the relationship between the failure time and observation process,we assume that there exists a frailty term b,which serves as the bridge between failure time and observation process.The assumptions for the models are similar with the previous second problems we considered,and the specific content is given in the Chapter 4.Denote ? =(?1T,?2,?T,?2,?0(·),?0h(·)),?0h(·))the known parameters and ?(bi)the density function of the frailty,the full likelihood function for O ={Oi,i =1,...,n}has the formIn the following,we will discuss estimation of parameters of interest.By following the Liu et al.(2016),we propose to employ sieve approach to approximate ?O(·)and?0h(·)by the Bernstain polynomials over I =[a1,a2],where a1,a2 denote the lower and upper bounds of the observation time.For the estimation of parameters,we present an EM algorithm.First note that if the bi's were observed,the pseudo-complete data likelihood function based on {(Oi,bi),i=1,...,n} would beLc(?)=L1(?1)L2(?2))L3(?2).(9)In the above,?1=(?1T,?2,?,?0(·)),?2 =(?,A0hh(·)),andThen,compute the log-expectation of(9)at the(k + 1)th iteration conditional on the observed data and the current estimate,that is,Q(?|O,?(k))=E[lC(?;O,b)|O,?(k)]=Eb[log L1(?1)|O,?(k)]+ Eb[log L2(?2)|O,?(k)]+ Eg[log L3(?2)|O,?(k)],(10)where Eb[lC(?)|O,?(k)]denotes the conditional expectation of log L(?)with respect to b given O and the current parameter value ?(k).In the M-step,we need to maximize Eb[log L1(?1)|O,?(k)],Eb[log L2(?2)|O,?(k)],Eg[log L3(?2)|O,?(k)]with respect to ?1,?2 and ?2,respectively.For the expectations above,it is apparent that they do not have closed forms and we employ the Monte Carlo method and the details are given in Chapter 4.In the(k + 1)th iteration,we maximize the conditional expectation(10)with respect to ? at the kth iteration to obtain the score functions S?1(?1),S?2(?1),S?(?1),S?l(?1),S?(?2),S?1(?2),the specific forms are presented in Chapter 4 and let the score functions to be zero to obtain the update estimators of(k+1)th iteration.In summary,by combining all steps above,we can have the following algorithm.Step 1.Choose m and initial values for all parameters,that is,?(0).Step 2.At the(k + 1)th iteration step,calculate the conditional expectations Ei{?i1},Ei{?i2},Ei{?i3),Ei{?i4},Ei{?i5},Ei{?i6} Ei{?i7},Ei[ebi],Ei(bi2)at ?=?(k).Step 3.Obtain the updated estimators ?1(k+1),?2(k+1)and ?(k+1)by solving the equa-tions S?1(?1)= 0,S?2(?1)=0 and S?(?1)= 0 with ?l =?l((k),l=0,1,...,m.Step 4.Update the estimators ?l(k+1)'s by solving the equations S?l(?1?= 0 with?1?= ?1(k+1),?2=?2(k+1),and ?= ?(k+1).Step 5.Obtain the updated estimator ?(k+1)by solving the equations S?(?2)= 0 with ?l=?l(k),l= 0,1,...,m.Step 6.Update the estimators ?l(k+1)'s by solving the equations S?l(?2)=0 with?=?(k+1).Step 7.Determine the estimator ?2(k+1)by the specific expressions.Step 8.Repeat Step 2-7 until the convergence.In the following,under some regularity conditions,the theoretical results of es-timator 6 are summarized in the following theorems and the limits are taken under n ??.Theorem 4 Under Conditions(C4.1)-(C4.5)given in Chapter 4,?1,?2,?,?,?2 are strong consistent estimators of ?10,?20,?0,?0,?02,respectively,and furthermore,we have??n-?0?2?0,??Anh-?0h?2?0 almost surely.Theorem 5 Under Condition(C4.1)-(C4.5)given in Chapter 4,d(?,?0)= Op(n-(1-v)/2 + n-rv/2).Moreover,d(?,?0)= Op(n-r/(2+2r)if v = 1/(1 +r).Theorem 6 Suppose Condition(C4.1)-(C4.6)given in Chapter 4 hold,then n1/2((?1-?10)T,(?2-?20),(?-?0)T,(?2-?02))?N(0,?)in distribution,and ?1T,?2,?T,?T,?2)T is semiparametrically efficient.where for estimation of the standard errors of the above parameter estimators,we employ the simple bootstrap procedure by following others(Efron,1979;He et al.,2009;Huang et al.,2010).
Keywords/Search Tags:Interval-censored data, Dependent censoring, Proportional hazards model, Additive hazards model, Frailty model, Cure model, EM algorithm
PDF Full Text Request
Related items