Font Size: a A A

Statistical Inference For Panel Count Data

Posted on:2017-03-13Degree:DoctorType:Dissertation
Country:ChinaCandidate:D XuFull Text:PDF
GTID:1220330482492035Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
The analysis of panel count data has recently attracted considerable attention. Panel count data usually arise in studies that concern recurrent events and in which each study subject is examined or observed only at discrete time points and so provide only the numbers of occurrences of the events between subsequent observation times. Furthermore, the number of observations and observation times may vary from subject to subject. The areas in which one often faces such data include medical periodic follow-up studies, reliability experiments, AIDS clinical trials, animal tumorgenicity experiments, and sociological studies (Kalbfleisch and Lawless,1985; Sun and Zhao, 2013; Thall and Lachin,1988).To describe the form of panel count data, consider an recurrent event that involves n independent subjects, each subject gives rise to a counting process Ni(t), representing the total number of occurrences of the recurrent event of interest from subject i up to time t, which is often referred to as the recurrent event process. For subject i, suppose that Ni(t) is observed only at finite time points Ti,1< Ti,2<…<Ti,mi, where mi denotes the number of observations. In practice, there usually also exists a follow-up time denoted by Ci, meaning that the subject is followed up to time Ci. Define Oi (t)= Oi*(C,∧t), where Oi* (t)=Σj=1∞ I(Ti,j≤t) and a ∧ 6 denotes the minimum of a and b, i= 1,2,...,n. Oi*(t) and Oi(t) are often referred to as underlying and observed observation processes, respectively. Then the observed panel count data on the Ni(t)’s have the form {Ci,mi,Ti,j, Ni(Ti,j);j=1,2,..,mi,i=1,2,...,n}. To deal with panel count data, one faces both the process of interest Ni(t) and the observation process Oi(t) in addition to the variable Ci or the count process I(t≤Ci). Of course, if all three processes are independent of each other, one only needs to focus on the recurrent event process Ni(t) and can conduct the analysis conditional on the other two processes. In practice, however, it can happen that the recurrent event process of interest Ni(t) and the observation process Oi*(t) are related. In this case, the analysis is much more difficult and the resulting panel count data are often referred to as panel count data with informative or dependent observation processes.For the treatment comparison based on panel count data, a number of procedures have been developed in the literature. But most of available procedures require that observation times follow the same observation process, which clearly may not be true in practice. There exist a couple of nonparametric comparison procedures for the panel count data situation that allow unequal observation processes, but apply only to limited situations. For this, we will propose a new class of nonparametric test procedures that allow unequal observation processes and apply to more general situations.Consider an recurrent event study that involves n independent subjects from m different treatment groups and each subject may experience a single type of recurrent events. Suppose that there are nl subjects in the lth group and let Sl denote the set of indices for the subjects belonging to group l, where Σl=1m nl=n, l=1,2,...,m. Also let Ni(t) be the underlying counting process representing the total number of the occurrences of the recurrent event of interest up to time t from subject i, and Z,i and Ci denote the group-indicating vector associated with the subject and the censoring or follow-up time on the subject, respectively. In the following, we will assume that each subject is observed only at the discrete time points Ti,1<Ti,2<...<Ti,mi, where mi represents the total number of observation times on subject i. Then the observed data have the form {Zi,Ci,mi,Ti,j,Ni(Ti,j);j=1,2,...,mi, i=1,2,..., n}. That is, one only has panel count data on the Ni(t)’s. Our goal will be to test the null hypothesis Ho:μ1(t)=μ2(t)=…μm(t), where μl(t)=E{Ni(t)|Zi} for i∈Sl, the mean function of the underlying event history process for the subjects in group l.Define Oi(t)=Oi*(Ci∧t), where Oi*(t)=Σj=1∞ I(Ti,j< t) and a∧b denotes the minimum of a and b, i=1,2,..., n. Note that Oi*(t) and Oi(t) represent the underlying and observed observation processes, respectively. As discussed above, the subjects in different treatment groups may have different observation processes. That is, they may depend on the Zi’s. To characterize this, we will assume that Oi*(t) can be described by the following proportional rate model E{dOi*(t)|Zi}=λ0(t) exp(γ’Zi) dt, (1) where λ0(t) is an unspecified continuous function and γ is a vector of unknown regres-sion parameters. It is easy to see that under the model above,γ=0 means that the observation processes are independent of the treatments or identical for all subjects. In the following, we will assume that Ni(t) and Oi*(t) are independent of each other given Zi and also that Ci is independent of {Ni(t), Oi*(t), Zi}, i=1,2,..., n. Some comments on them will be given below.To construct the test statistics, note that under model (1) and conditional on Zi, for i∈Sl, one can easily show that where G(t)=P(Ci≥t) and τ denotes the longest follow-up time. It thus follows that we have Let γ denote a consistent estimator of γ, which will be discussed below, and define To test H0, motivated by the idea commonly used for recurrent event data (Cook et al.,1996), we propose to employ the test statistic,J(γ)=(J1(γ),J2(γ),…,Jm-1(γ))’ with Jl(γ)=φl(γ)-φm(γ),l=1,2,…, m-1, (3) where In the above, W(t) denotes a predictable weight process and the sample mean of Ni(t;γ) over i ∈ Sl, l=1,2,...,m. It is easy to see that Jl(γ) represents the difference of the sample means of the integrated weighted responses from the underlying recurrent event processes between groups l and m.In the proposed test statistics, we have assumed that there exists an estimator of γ. For this, note that we have recurrent event data on the Oi*(t)’s or model (1) and thus it is natural to define γ as the solution to the estimating equation (Lin et al.,2000:Cook and Lawless,2007).It has been shown that such defined esti-mator is consistent and unique.Theorem 1 Under some regularity conditions,and under H0,J(γ) converges in distribution to a multiuariate normal distribution with mean vector O and the couariance matrix that can be consistently estimated by V=(uij)(m-1)×(m-1).Here for j≠h,j,h= 1,2,...,m-1, and where and withThus one can test the null hypothesis H0 by using the statistic T(γ)=J(γ)’V-1J(γ) based on the χ2 distribution with the degrees of freedom m-1.Interval-censored failure time data and panel count data are two types of incom-plete data that commonly occur in event history studies (Sun,2006; Sun and Zhao, 2013). The former concerns the occurrence rate of a failure event or the event that occurs only once or only whose first occurence is of interest, while the latter provides information on the occurence rate of a recurrent event. Futhermore, the former means that the occurrence time of the event is observed or known only to belong to an inter-val, while the latter means that one only observes the numbers of the occurrences of the event between some discrete observation times. A common feature behind them is that they typically involve or occur with a periodic follow-up observation scheme.Many authors have discussed the analysis of either interval-censored failure time data or panel count data. However, it does not seem to exist an established approach for joint analysis of the two types of data together. Here, we will present a sieve maximum likelihood approach for the problem.Consider an event history study that involves n independent subjects and two events of interest, a failure event and a recurrent event. For subject i, suppose that there exists a p-dimensional vector of covariates denoted by Zi, and let Ti and Ni(t) denote the occurrence time of the failure event and the number of the occurrences of the recurrent event up to time t, respectively, i=1,2,...,n. Also suppose that each subject is observed only at a sequence of time points denoted by si,1<si,2<…<si,mi, where mi denotes the number of observations on subject i. Hence for each subject, one only observes where si,0=0 and Si,mi+1=∞. That is, we only have interval-censored data on the Ti’s and panel count data on the Ni(t)’s.To describe the effects of covariates on Ti and Ni(t), we will assume that there exists a latent variable ηi with mean 1 and unknown variance γ>0. Suppose that given Zi and ηi, the cumulative hazard function of Ti has the form where Λ1 denotes an unknown baseline cumulative hazard function and a is a vector of regression parameters. That is, Ti follows the proportional hazards frailty model. For Ni(t),we will assume that it is a nonhomogeneous Poisson process with the proportional mean function where Λ2 is an unknown, nondecreasing baseline mean function and β is a vector of regression parameters as α. In the following,we will assume that conditional on Zi and ηi,Ti and Ni(t) are independent. Also given Zi,{mi,si,j;j=1,2,…,mi} and {ηi,Ti,Ni(t)} are independent,and the conditional distribution of {mi,si,j;j= 1,2,…,mi} given Zi does not involve the parameters in models(7) and(8).Define θ=(α,β,γ,Λ1,Λ2).Then the likelihood function of θ is given by Ln(θ)= Πi=1n L(θ|(Oi),where with Note that if the ηi’s are assumed to follow the gamma distribution,the likelihood contribution L(θ|(Oi) can be simplifed to where withNow we will discuss the estimation of the parameters θ and for this,it is ap-parent that a natural way would be to maximize the log likelihood function ln(θ) log {Ln(θ)}. On the other hand,it is easy to see that this would be difficult or not straightforward. Thus instead, by following Huang and Rossini (1997) and others, we propose to employ the sieve maximum likelihood estimation approach. More specifi-cally, define the parameter space of θ, and the sieve space. In the above, with M being a positive constant, Mj denotes the collection of all bounded and con-tinuous non-decreasing, non-negative functions over the interval [cj,uj], and with the Bernstein basis polynomials of degree m=o(nv) for some v∈(0,1), where 0≤ cj<uj<∞ with [cj,uj] usually taken as the range of observed data, j=1,2.It is easy to see that one can approximate the parameter space (?) by the Bernstein polynomials-based sieve space, or approximate Λj(t) by Λjn(t) with the coefficients φjk=Λj(cj+(k/m)(uj-cj)), j=1,2. Although the use of Bernstein polynomials may seem to be complex, it actually can be relatively easily implemented as seen below. The use of Bernstein polynomials transfers an estimation problem about both finite-dimensional and infinite-dimensional parameters into a simpler estimation problem that involves only finite-dimensional parameters. One advantage of Bernstein polyno-mials is that they can naturally model the nonnegativity and monotonicity of Λ1 and Λ2 with simple restrictions that can be easily removed through reparameterization in implementation (Osman and Ghosh,2012). Also one can show that the size of the sieve space defined above can be controlled by Mn=O(na) with a being a positive constant (Lorentz,1986; Shen,1997).For estimation of θ, define the sieve maximum likelihood estimator θn=(α’n,β3’n,γn, Λ1n,Λ2n) to be the value of θ that maximizes the log likelihood function ln(θ)= log {L,(θ)}. To establish the asymptotic properties of θn,let θ0 =(α,0,β,0,γ0,Λ10,Λ20) denote the true value of 0 and define the L2(P) norm ‖f(X)‖2=((?)|f|2dP)1/2 for a function f with P being the probability measure for X. The following theorems give the consistency, the rate of convergence and the asymptotic normality of the estimator.Theorem 2. Under some regularity conditions, then αn,βn, and γn are strongly consistent, and as n'∞, almost surely.Theorem 3. Under some regularity conditions and r> 2 with r defined in one of regularity conditions. Then as n'∞, we have thatTheorem 4. Under some regularity conditions and r>2. Then as n'∞, (?)(vn-v0) converges in distribution to the multivariate normal distribution with mean zero and vn is semiparametrically efficient, where r defined in one of regularity conditions, vn=(α’n,β’n,γn)’ and v0=(α’0,β’0,γ0)’.To make use of the results above, it is apparent that one needs to estimate the covariance matrix of vn. One natural way would be to employ the inverse of the information matrix of the log likelihood function ln(θ). On the other hand, this is quite difficult because of the complicated form of the information matrix. To deal with this, we suggest to use the profile approach that approximates the (i,j) element of the information matrix by where pln(v)=supΛ1,Λ2 ln(v,Λ1,Λ2), ei is a unit vector with the ith element equal to 1 and all other elements equal to 0, and ρn is a tuning constant with an order of n-1/2.For the implementation of the estimation procedure proposed above, two issues need to be addressed. One is that there exist some restrictions on the parameters due to the nonnegativity and monotonicity of the functions Λ1 and Λ2, and for this, one can easily remove it by using some reparameterization. A natural way is to reparameterize the frailty variance parameter γ as exp(γ*) and the parameters {φj0,...,φjm} as the cumulative sums of {exp(φj0*),...,exp(φjm*)}, j=1,2, giving the total number of parameters to be estimated being 2(p+m)+3. Another issue is the selection of the degree m of the Bernstein polynomials for the parameter space On, which controls the roughness or smoothness of the approximation. It is apparent that a simple approach is to use several different values that are in the order o(nv) and compare the results. As an alternative, by following the BIC criterion commonly used for model selection (Burnham and Anderson,2002), one can choose the value of m that minimizes BIC=-2ln(θn)+(2(p+m)+3) log n.The measurement error models have recently attracted considerable attention be-cause measurement error data are often encountered in many fields, such as medicine, economics and engineering. Simply ignoring measurement error in the covariate may lead to a biased and inconsistent estimator. Cook and Stefanski (1994) developed the SIMEX method to correct the effect estimates in the presence of additive measure-ment error. Carrol et al. (1996) further investigated the asymptotic distribution of the SIMEX estimator. Since then, the SIMEX method has become a standard tool for correcting the biases induced by measurement error in covariates for many complex models. In the following, we consider the regression analysis of panel count data with covariate measurement error, and we will estimate the regression parametres by using the SIMEX approach.Consider an event history study that involves n independent subjects. For subject i, let Ni(t) denote the cumulative number of the events that have occurred up to time t,0≤t≤τr, where τ denotes the length of the study. Also for subject i, assume that there exist a vector of covariates denoted by Zi, a follow-up or censoring time Ci. Suppose that each subject is observed only at a sequence of time points denoted by Ti,1<Ti,2<...<Ti,mi≤Ci, where mi,i denotes the number of observations on subject i. Define Oi*(t)=Σj=1∞ I(Tij≤t) and Oi(t)=Oi*(t ∧ Ci), the underlying and observed observation processes, respectively. For the effects of covariates on the underlying recurrent-event process of interest Ni(t), we will assume that given Zi, the conditional mean function of Ni(t) has the form E{Ni(t)|Zi}=μ(t)+βTZit. (9). Here β denotes the vector of regression parameters of interest, and μ(t) is a positive, unspecified, and nondecreasing function. We assume that given Zi, Oi(t), Ni(t) and Ci are independent, Oi*(t) and Zi are independent. We assume an additive measurement error model as Wi=Zi+Ui, (10) where Wi is the observed surrogate, Ui is the measurement error with E(Ui)=0, Var(Ui)=Σu. And Ui is independent of {Zi,Oi*(t),Ni(t),Ci}. When Ui is zero, there is no measurement error. For simplicity, we consider only the case where the measurement error covariance matrix Σu is known. Otherwise, Σu need to be first estimated, e.g., by the replication experiments method in Liang et al. (1999).To conduct estimation for β in the presence of covariate measurement error, Cook and Stefanski (1994) introduced the SIMEX algorithm. The SIMEX algorithm consists of the simulation step, the estimation step, and extrapolation step. It aims to add additional variability to the observed Wi in order to establish the trend between the measurement error induced bias and the variance of induced measurement error, and then extrapolate this trend back to the case without measurement error. Here, we use the SIMEX algorithm and the estimating equation to estimate β. The proposed algorithm is described as follows.(Ⅰ) Simulation stepFor each i=1,2,..., n, we generate a sequence of variables where Uib~N(O,IP), Ip is a p × p identity matrix, B is a given integer, and Λ= {λ1, λ2,...λM} is the grid of λ in the extrapolation step. We set the range from 0 to 2.(Ⅱ) Estimation stepFirst, let H(t)=E{Oi*(t)}, note that we have recurrent-event data on the Oi*(t)’s, hence it is natural to use the Nelson-Aalen estimator denoted by H(t) to estimate H(t). Thus, for estimation of β, it is natural to employ the following estimating equation with j=1,2, whereBy solving equation (11), we can obtain an estimator of β, say βb(λ), by βb(λ)= Anb-1Dnb(λ), where With the estimated values βb(λ),b=1,2,...,B,we average them and obtain the final estimate of β as(Ⅲ) Extrapolation stepFor the extrapolant fuuction,we consider the widely used quadratic function g(λ,Γ)=γ1+γ2λ+γ3λ2.with Γ=(γ1,γ2,γ3)T. We fit a regression model of {β(λ),λ∈Λ)on {λ∈Λ).and denote Γ as the estimated value of Γ.The SIMEX estimator of β is then defined as βSIMEX=g(-1,Γ). When λ shrinks to 0,the SIMEX estimator reduces to the naive estimator βNaive g(Γ,0),that neglects the measurement error with a direct replacement of Zi by Wi.For the asymptotic property of βSIMEX,we haveTheorem 5. Under some regularity conditions,then with gΓ(λ,Γ)={a/a(Γ}T)g(λ,Γ),∑(Γ)=D-1(Γ)s(Γ)∑ST(Γ)D-1(r),where D(Γ), s(Γ) and ∑ are defined in the text.
Keywords/Search Tags:Panel count data, Recurrent event, Nonparametric test, Interval-censored data, Sieve maximum likelihood estimation, Bernstein polynomials, Measurement error
PDF Full Text Request
Related items