Font Size: a A A

Statistical Inference For Integer-valued Time Series And Multivariate Panel Count Data

Posted on:2013-07-15Degree:DoctorType:Dissertation
Country:ChinaCandidate:H X ZhangFull Text:PDF
GTID:1220330395959672Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
Integer-valued time series data are fairly common, such as the number of patients in a hospital at a specific time. A first-order autoregressive process with count da-ta is denned through the thinning operator "o", which is due to Steutel and Van Harn (1979). This process has many properties as the AR(1) model and it has been discussed by Mckenzie (1988), Al-Osh and Alzaid (1987,1992), Alzaid and Al-Osh (1988). Recently, first-order random coefficient integer-valued autoregressive(RCINAR (1)) processes was introduced by Zheng et al.(2007), and the definition is as follows: Xt=φt(?)Xt-1+Zt,(1)where {φt} is an i.i.d. sequence with cumulative distribution function (CDF) Pφ on [0,1);{Zt} is an i.i.d. non-negative integer-valved sequence with probability mass function (PMF) fz such that E[Zt4)<∞; X0,{φt} and {Zt} are independent; E(X02)<∞;φt o Xt-1=∑i=1Xt-1Bit, where Bit is an i.i.d. Bernoulli random sequence with P(Bit=1)=φt and is independent of Xt-1. Letφ=E(φt),λ=E(Zt), and note that they are all assumed finite. Let be the CLS criterion function, where β=(φ,λ)’. Note thatwhere mlt(β)=(Xt-φXt-i-λ)Xt-1,m2t(β)=Xt-φXt-1-λ.From Owen(1988), the profile empirical likelihood ratio function is The maximum may be found via the Lagrange multiplier. Letwhere γ and b∈R2are the Lagrange multiplier. Setting the partial derivative of G with respect to pt to be zero, we have (?)G/(?)pt=1/pt-nb(β)mt(β)+λ=0,where b(β)’ denotes the transpose of b(β), b(β) satisfiesThe log empirical likelihood ratio statistic isTheorem1Under some regularity conditions, as n'∞; we have l(β)'c χ2(2); where l(β) is defined in (2)Now, we will construct the confidence regions for the parameter (3, the results are as follows:Theorem2For the parameter β, the100(1-α)%confidence region is Cr,r {β|R(β)≥r}, and (i) Cr.n is a convex set;(ii)r=exp{-1/2χ2,1-α(2)}.we will consider the maximum empirical likelihood estimator (MELE) for the parameter β, the log empirical likelihood function is denned asThe MELE is denned as Let B=B(β0,n-1/3)=△{β|‖β-β0‖<n-1/3},β0=(φ0,λ0)’is the true parameter value for β. For the MELE, we have the following results:Theorem3For the MELE β, where β is defined in (3), we have (i) P (β is in the interior of B)'1,as n'∞.(ii)(3satisfies Q1n(,b)=0, Q2n(β,b)=0, where b=b(β),Theorem4For the MELE β, we haveFurthermore, we consider the higher-order moments and cumulants of the first-order random coefficient integer-valued autoregressive (RCINAR(1)) process and prove that they satisfy a set of Yule-Walker type difference equations. The spectral and bispectral density functions are obtained, which can characterize the RCINAR(1) process in the frequency domain. We use a frequency domain approach which is named Whittle criterion to estimate the parameters of the process. Also we propose a test statistic which is based on frequency domain approach for the hypothesis test, H0:α=0←'H1:0<α<1, where α is the mean of the random coefficient in the process. Let fn(ω) denotes the normalized spectral density function of {Xt}, For convenience, we also denote fn(ω) by f(ω). Then for process (1), we know that f(ω)=1/2π·1-α2/1-2αcosω+α2,-π≤ω≤π. Under Hq we have Xt=et and we are dealing with an i.i.d. sequence with spectral density f(0)=1/2π, while under H1we have f(0)>1/2π. This motivates a test that rejects H0for large values of f(0), where f is an estimate of the density (based on the empirical autocorrelation function).To obtain an estimate of f(0), we define the sample autocorrelation function of {Xt} as where the sample autocovariance function iswith X=n-1∑tn=1Xt. A kernel-based estimator for f(0) is given bywhere q is a positive integer and K(·) is a kernel function satisfying the following Assumption A1.Assumption A1. The function K(·): R'[-1,1] is symmetric and continuous at zero and all but a finite number of points with K(0)=1and∫0∞|K(z)dz<∞. Furthermore,0<∑j=1∞K2(j/q)<∞for any finite positive integer q.The proposed test statistic S which is based on comparison between f(0) and1/2π has the following form, whereTheorem5Suppose Assumption A1holds, and let q/n'0,then S'c N(0,1) under H0, where S is defined in (4).We also study the empirical likelihood method for the pth-order random coefficient integer-valued autoregressive process (RCINAR(p)) which was proposed by Zheng et al.(2006). The RCINAR(p) process {Xt} satisfies an equation of the form:where {φi(t)} is an iid sequence with a cumulative distribution function Pφ. on [0,1) and {Zt} is an iid non-negative integer-valued sequence with a probability mass function fz>0. Also in the above,{φi(t),1≤i≤p] and {Zt} are independent sequences. Let λ=E(Zt),φi=E(φi(t)),i=1,…,p, and we suppose∑i=1p φi<1. Consider the process given by (5) and letwith β=(φi,1≤i≤p,λ)’ being the conditional least squares (CLS) criterion function. By taking the derivative of S(β) with respect to β, we have the estimating equation whereLet Ft=σ(X0,X1,…,Xt). It is easy to verify that {mt,Ft,t≥1} is a martingale difference sequence. It follows from Mykland (1995) or Chuang and Chan (2002) that the profile ELR function is {(?)nωt|ωt≥0,(?)ωt=1,(?)ωtmt=0} and the maximum above may be found by the method of Lagrange multipliers. Letwhere γ∈R and b∈Rp+1are Lagrange multipliers. Setting the partial derivative of G with respect to ωt to zero, we have (?)G/(?)ωt=1/ωt-nb’mt+γ=0.This yields0=∑t=1nωt(?)G/(?)ωt=n+γand ωt=1/n1/1+b’mt,t=1,…,n, where b’ denotes the transpose of b and b satisfies Let b(β) denote the solution to (7), which can be obtained by numerical algorithms. Thus the log ELR statistic has the form Theorem6Under some regularity conditions, we have, as n'∞l(β)'c χP+12, where l(β) is defined in (8). Now we will construct confidence regions for β.Theorem7For β, the100(1-α)%confidence regions can be written as: Cr,n={β|R(β)≥r}, where R(β) is defined in (6). And (i) Cr,n is a convex set;(ii) r=exp{-1/2χp+1 2,1-α}.In practice, it is frequent to encounter a time series of counts which are small in value and show a trend having relatively large fluctuation. For this type of non-stationary integer-valued time series, as the usual real-valued ARIMA model, differ-ecing is commonly used to remove the time trend and sensonality from a time series. This differenced series is still integer-valued but can be negative-valued. So the differ-enced series can not fit to the previous INAR-type of models, because the INAR model only applies to non-negative valued time series. To model this differenced time series, Kim and Park (2008) introduced a process called integer-valued autoregressive process of order p with signed binomial thinning (INARS(p)). We introduce a new operator represented by (?) and call it the " signed generalized power series thinning operator ", this new operator is an extension of the signed binomial thinning given in Kim and Park (2008). Define sgn(x)=1, if x≥0and sgn(x)=-1if x<0. Using this notation, the signed generalized power series thinning operator is denned aswhere Wj is i.i.d. generalized power series distribution, E(Wj)=|α|,Var(Wj)=β,{Wj} and Xt is independent.To handle non-stationary integer-valued time series with a large dispersion, we introduce a new process called GINARS(p), this model is defined by the following recursive equation: Xt=α1(?)Xt-1+α2(?)Xt-2+…+αp(?)Xt-p+(?)t,t≥1,(10) where αi (?) Xt-1=sgn(αi)sgn(Xt-i)∑i=1|Xt-i|Wj(i),Wj(i)is i.i.d.generalized power series distribution with finite mean|αi|and variance βi,i=1,…,p;{(?)t}is an i.i.d. integer-valued random variables sequence. The {(?)t} are uncorrelated with Xt-i for i≥1and all counting series {Wj(i)} in (10) are independent, et is independent of X0. Let μ(?)=E((?) μ|(?)|=E|(?)t|,σ(?)2=Var((?)t) and note that they are all assumed finite. For the above process, we have the following results.Proposition1Suppose {Xt} is a stationary process and∑i=1p=αi≠1,then fort≥1,(ⅰ)E(Xt|Xt-i,1≤i≤p)=∑i=1p αiXt-i+μ(?),(ⅱ)E(Xt)=μ(?)/(1-∑i=1p αi)(ⅲ)Var(Xt|Xt-i,1≤i≤p)=∑i=1p βi|Xt-i|+σ(?)2,(ⅳ) Let γk=Cov(Xt,Xt-k), then Let then we have the following theorem.Theorem8Suppose all the eigenvalues of A are inside the unit circle, then there exists a unique strictly stationary integer-valued random series {Xt} satisfies: Xt=α1(?) Xt-1+α2(?)Xt-2+…+αp(?)Xt-p+(?)t,t=0,±1,±2, Cov(Xs,(?)t)=0, for s<t. Furthermore, the process is an ergodic process.For the GINARS(p) process, we consider mainly two different methods of param-eter estimation, namely, Yule-Walker (YW) and conditional least squares (CLS). An advantage of these methods is that they do not require specifying the exact family of distribution for the process. Then we can obtain the Yule-Walker equations:Letwhere X=1/n∑t=1n Xt.Replacing γk with γk in (12), yields the Yule-Walker estimate, which is as followsUsing αiYW,i=1,…,p, definewhere (?)n=1/n∑t=1n (?)t,(?)t=Xt-∑i=1p αiYW Xt-i,βi is a strongly consistent estimation of βi, and βi can be obtained by Yule-Walker or conditional least squares method.Theorem9The estimators α1YW,α2YW,…,αpYW,μ(?)YW,and σ(?)2YW are strongly consistent.Let η=(α1,…,αp,μ(?))’, we can obtain the CLS estimation of η, which is as follows: ηCLS=Q-1q,(13)where andTheorem10Supposc E|Xt|4<∞,then for the CLS estimators ηCLS given by (13),as n'∞,we have whereNow we will extend"④"to a generalized signed thinning operator"(?)G",mean-while,we propose a new random coefficient process which is called generalized random coefficient first-order integer-valued autoregressive process with signed thinning opera-tor(GRCINARS(1)).The definition is as follows:An integer-valued stochastic process {Xt}is said to be a GRCINARS(1)process if it satisfices the following difference equa-tion: Xt=αt(?)GXt1+(?)t,(14) where (i){(?)l}is a sequence of i.i.d.integer-valued random variables,with E((?)t)=μ(?), E(|(?)t|)=μ|(?)|,Var((?)t)=σ(?)2;{αt)is an i.i.d.random sequence on R.Furthermore,{αt)is independent of {(?)t}.(ii) the generalized signed thinning operator④G is defined as follows: given αt,all counting series{Wj(t))of the thinning operation are mutually independent nonnegative integer-valued random variables,independent of{(?)t},with E(Wj(t)|αt)|αt|,Var(Wj(t)|αt)=βt,and βt depends only on α (iii)E(αt)=μα,E|αt|=μ|α|,Var(αl)=σα2,E(βt)=β,and assume that they are all finite.Proposition2.Suppose GRCINARS(1)process {Xt} is a stationary process and μα≠1,then for t≥1,(i)E(Xt|Xt1)=μαXt1+μ(?),(ii)E(Xt)=μ/1-μα,(iii)Var(Xt1,αt)=|Xt1|βt+σ(?)2,(iv)Vat(Xt|Xt-1)=Xt-12σα2+|Xl-1|β+σ(?)2,(v)Let γk=Cov(Xt,Xt k),then γk=μαkγ0,k=1,2.Theorem11If0<σα2+μα2<1and μ|α|<1,then there exists a unique strictly stationary integer-valued random series {Xt}satisfying(14).Furthermore,the process is an ergodic process.Let Q(θ)=∑tn=1{Xt-E(Xt|Xt-1)}2=∑t=1n(Xt-μαXt1-μ(?))2,where θ (μα,μ(?))’,then the CLS estimator is defined by θCLS=△arg min{Q(θ)} The following result establishes the asymptotic distribution of the CLS estimator.Theorem12Suppose E|Xt|4<∞,then under the regularity conditions(see Klimko and Nelson,1978),the CLS estimator θCLS is consistent and has the following asymptotic distribution:√n(θCLs—θ)'N(0,V’WV’),where W=(E(μ12(θ)·aE(X1|X0)/aθi·aE(X1|X0)/aθj))1≤i,j≤2’, μ1(θ)=X1-E(X1|X0), V=(E(aE(X1|X0)/aθ·aE(X1/X0)/aθj)).1≤i,j≤2..Let where X=1/n∑t=1n Xt.From Proposition2(v),we obtain μα=γ1/γn,where γ1=Coy(Xt,Xt-1),γ0=Var(Xt).Thus,we can derive the estimator of μαas μαYw=γ1/γ0. UsingμαYW,define μ(?)YW=(?)n,where (?)n=1/n∑t=1n(?)t,(?)=Xt-μαYWXtTheorem13μαYW,μ(?)YW are strongly consistent estimators for μα and μ(?),respec tively.Now we propose a new random coefficient process which is called generalized pth-order random coefficient integer-valued autoregressive process with signed thinning operator(GRCINARS(p)).An integer-valued stochastic process{Xt}is said to be a GRCINARS(p)process if it satisfies the following difference equation: Xt=α1(t)(?)GXt-1+α2(t)(?)GXt-2+…+αp(?)(?)GXt-p+(?)t,(16)where (i){(?)t)is a sequence of i.i.d.integer-valued random variable,with E((?)t)=μ(?).,E(|(?)t|)=μ|(?)|,Var((?)t)=σ(?)2;{αi(t)} is i.i.d.random sequence on R with E(αi(t))=μαi,E|αi(t)|=μ|αi|,Var(αi(t))=σαi2,Var(|αi(t)|)=σ2|αi2|Cov(αo(t),αj(t)=0,i≠j.Moreover,{αi(t)} are independent of (?)l,i=1,…,p.(ii)given αi(t),the generalized signed thinning operator(?)G is defined as follows where the counting series{Wj(t,j)}are i.i.d.nonnegative integer-valued random vari—ables,with E(Wj(t,j)|αi(t)=|αi(t),Var(Wi(t,j)|αi(t))=βi(t),andβi(t) dependes only on αi(t)=1,…,p.Furthermore,all counting series {Wi(t,i)}in(16),and (?)t are indepen-dent of each other.Proposition3Suppose GRCINARS (p)process {Xt} is a stationary process and∑i=1p μα≠1,E(βi(t))=βi,then(i)E(Kl|Xl-i,1≤j≤p)=∑i=1p μαiXt-i+μ(?).(ii)E(Xt)=μ(?)/(1-∑i=1pμαi), (iii)Var(Xt|Xt-i,αi(t),1≤i≤p)=∑i=1pβi(t)|Xt-i|+σ(?)2,(iv)Var(Xt|Xt-i,1≤i≤p)=∑i=1p(σα2Xt-i2+βi|Xt-i|)+σ(?)2,(v) Let γk=Cov(Xt,Xt-k), then γk=∑i=1p μαiγk-i,k=1,2, Let then we have the following theorem.Theorem14Suppose∑i=1p μ|αi|<1and all the eigenvalues of E(A1’(?)A1) are inside the unit circle, then there exists a unique strictly stationary integer-valued random series {Xt} satisfies (16). Furthermore, the process is an ergodic process.For the GRCINARS(p) process {Xt}, we mainly consider three kinds estimation methods, namely conditional least squares (CLS), Yule-Walker(YW) and weighted con-ditional least squares (WCLS). We will consider that αi(t)=f(Yi(t),i=1,…,p where {Yi(t)} is a sequence of i.i.d. random variables with support on R, with f chosen so that strictly stationarity and ergodicity conditions in Theorem14are verified. In the following part of this paper, one common choice is αi(t)=-sin(Yi(t)),i=1,…,p.where {Yi(t)}~Exp{λi} and the density function of {Yi(t)} is fi(y)=1/λieⅡ(0,+∞),0<λi<1and so chosen that satisfies the conditions in Theorem14.let S(T)=∑t=p+1n (Xt-∑i=1μαiXt-i-μ(?))2, where T=(μα1,…,μαp,μ(?))’, be the CLS criterion function. The CLS estimators of T are obtained by minimizing S(T), which is as follows TCLS=Q-1q (17) whereandTheorem15For the CLS estimators TCLS given by(17),we harewhereFor the GRCINARS(p)process {Xt},we can derive the Yule-Walker equations as follows Letwhere X=1/n∑t=1nXt.Replacing γk with γk in (18), yields the Yule-Walker estimate, which is as followsUsing μiYw,i=1,…,p,definewhere (i)(?)t-Xt-∑i=1p μαiYWXt-i,(?)n=1/n∑t=1n(?)t<sub>^iiμ|ai|=∫0∞|sin(y)1/λie dy,σ|aj|2=∫0∞sin2(y)1/λie-y/λidy-μ|α2. λi=-1+√1-4(μαiYW)2/2μαiYW),(iii) βi is a strong consistent estimation of βi,i=1,2,…,p, and can be obtained by Yule-Walker or conditional least squares method.Theorem16The estimators μa1YW,…,μαpYw,μ(?)Yw,σ(?)2YW are strongly consistent. Let θ=(σαi2.β(?)2,1≤i≤p)’, recall from Proposition3(iv), the expression for one-step conditional variance isThe WCLS estimator of T=(μα1,…,μαp,μ(?))’ is as follows: T=Qn-1qn,(19)where Qn=(Qij)(p+1)×(p+1),Qij=Qji,1≤i≤j≤P+1. Theorem17For the WCLS estimator T given by (19), we have, as n'∞;√n(T-T)'N(0,T-1(θ)),where T(θ)=(Tij(θ))(p+1)×(p+1),Tij(θ)=Tij(θ),1≤i≤j≤p+1,Tii(θ)=E(Vθ-1(X(p+1|Xp+1-i,1≤i≤p)Xp+1-i2),1≤i≤p, T(p+1)(p+1)(0)=E(Vθ-1(Xp+1|Xp+1-i,1≤i≤p),Tij(θ)=E(Vθ-1(Xp+1|Xp+1-i,1≤i≤pXp+1-iXp+1-j),1≤i≤p,1≤j≤p, Ti(p+1)(θ)=E(Vθ-1(Xp+1|Xp+1|Xp+1-i,1≤i≤p)Xp+1-i,1≤i≤p.The analysis of panel count data has recently attracted considerable attention. By panel count data, we mean the data arising from event history studies that concern some recurrent events and in which study subjects are monitored or observed only at discrete time points instead of continuously. They occur in many fields including demographical and epidemiological studies, medical research, reliability experiments, tumorgenicity experiments and sociological studies (Kalbfleisch and Lawless,1985; Thall and Lachin,1988; Sun,2006). Multivariate panel count data arise if the event history study involves several related recurrent events. We will discuss regression analysis of multivariate panel count data with the focus on variable or covariate selection.Consider an event history study that consists of n independent subjects and sup-pose that each subject may experience K different types of recurrent events. For subject i, let Nik(t) denote the total number of type k events that have occurred up to time t,0≤t≤L, where L denotes the study length, i=1,…, n, k=1,…, K. Also for each i and k, suppose that there exists a positive random variable Ci representing the censoring or follow-up time on subject i and a d x1vector of covariates denoted by Xi=(Xi1,…,Xid)’that may affect the occurrence rates of the events. Without loss of generality, we will assume that the expected value of Xi is zero and d is fixed. In the following, we will assume that given Xi, the marginal mean function of Nik(t) has the form E{Nik(t)|Xi}=μk(t)gN(Xi’β).(20) Here μk(t) is an unknown continuous baseline mean function, β=(β1,…,βd)’is a d×1vector of regression parameters and gN(·) is a known, positive function assumed to be strictly increasing and twice differentiable.To describe the observed data, suppose that Nik(·) is observed only at finite time points Tik.1≤…≤Tik,mik.Here mik denotes the potential or scheduled number of observations on the kth type of recurrent event for subject i, i=1,…, n,k=1,…,K. That is,we only have panel count data. For each i and k, define Hik(t)=∑j=1mikI(Tik,j≤t) and Nik(t)=Hik{min(t,Ci)}, the point process characterizing the real observation process on subject i with respect to the kth-type recurrent event. Following He et al.(2008), we will assume that the Hik(l) is a counting process with the marginal mean function E{Hik(t)|Xi}=vk(t)gH(Xi’γ),(21) given Xi.In the above, vk(t) is a completely unknown continuous baseline mean function,-y denotes the effect of covariates on Hik and gH(·) is a known, positive function assumed to be strictly increasing and twice differentiable.To select the significant variables, as mentioned above, one common approach is to apply some penalized methods. Among others, one penalty function, proposed recently by Dicker et al.(2012), is the SELO penalty function defined as where γ>0and T>0are tuning parameters with pγ,T(θ)≈λI{0≠0} small τ. We will first consider the covariates have no effects on the observation process and for this, without loss of generality, we will assume that gH(·)=1.To present the inference procedure, for each i and k, define i=1,…,n, k=1,…, K. Also define To estimate β, we propose to use the following penalized function and define the penalized estimate of βas βa=arg min (?)a(β)To establish the asymptotic properties of the estimate βa, let β0denote the true value of β and assume that it can be written as β0=(β10,…βd0)’=(β10,β20)’, where β10in and β20denote the nonzero and zero components of β0, respectively. Also write βa=(β’1a, β’2a)’the same as β.Define b=(Pλn,Tn(|β10|)sgn(β10),…,Pλn,Tn(|βs0|)sgn(βs0))’,∑=diag{pλm,Tn(|β10|),…,Pλn,Tn(βso|)}, and an=1≤j≤b max{pλn,Tn(|βj0|),βj0≠0}, bn=1≤j≤bmax.{pλn,Tn(βjo|),βj0≠0}, where s denotes the number of components of β0.Theorem18Suppose that some regularity conditions hold. Then there exists a local minimizer of (?)a(β), βa, such that‖βa-β0‖=Op(n-1/2+an).Theorem19Assume that some regularity conditions hold. Then with probability tending to1, the√n-consistent estimate in Theorem18satisfies: (i) Sparsity: β2a=0.(ii) Asymptotic normality:√n(Aa(β10)+∑){(β1a-β10)+(Aa(β10)+∑)-1b}'N(0,Ta(∞10)), where'denotes the convergence in distribution.For estimation of the covariance matrix of β1a, by following the idea of Fan and Peng (2004), we propose to use the following sandwich estimateCov(β1a)=1/n|{Aa(β1a)+∑λ,T(β1a)}-1Ta(β1a){Aa(β1a)+∑λ,T(β1a) In the above, Aa(β1a),∑λ,T(β1a) and Ta(β1a) are the s×s upper-left submatrics of and respectively, where A and f are given below and c(?)2=cc’ for a vector cNow we discuss the situation where the observation process Hik(t) may depend on covariates through model (21) but is independent of Nik(t) given Xi. Corresponding to la(β) and (?)a(β), we will consider the penalized function where Then it is natural to minimize (?)b,(β,γ) for estimation of β.To establish the asymptotic properties of βh, as βa, suppose that we can write βb={β1b,β2b)’.By using the same approach as with βa, one can show that βb does exist. Also we have the following results. Theorem20Suppose that the conditions given in Theorem19hold. Then as n'∞with probability tending to1, we have (i) Sparsity: β2b=0,(ii) Asymptotic normality:√n(Ab1(β10)+∑){(β1b-β10)+(Ab1(β10)+∑)-1b}'N(0, Tb(β10)). For estimation of the covariance matrix of β1b, by using the same idea as with β1a, we propose to use Cov(β1b)=1/n{Ab1(β1b)+∑λ,T(β1b)}-1Tb(β1b){Ab1(β1b)+∑λ,T(β1b)}-1,where Ab1(β1b),∑λ,T(β1b) and Tb(β1b) are the upper-left s×s submatrics of∑λ,T(βb)=diang{pλ,T(βb1)/|βb1|,…,pλ,T(|βbd|)/|βbd|},andT6(β6)=(Id,-Ab2(βb)B(γ)-1)φ(Id,-Ab2(βb)B(γ)-1)’, respectively. In the above, and dMi,k(t;γ)=dNik(t)-Yil(t)gⅡ(Xi’γ)dvk(t;)Multivariate panel count data arise in event history studies on recurrent events if there exist several related events and study subjects can be examined or observed only at discrete time points instead of over continuous periods. In these situations, a complicated issue that may arise is that the observation time points or process may be related to the underlying recurrent event process of interest. That is, we have infor-mative observation processes. It is obvious that to perform a valid analysis, both the relationship among different types of recurrent events and the informative observation process need to be taken into account. To address these, we propose a robust joint mod-eling approach. Consider an event history study that involves n independent subjects and each subject may experience K different types of recurrent events. For subject i, suppose that there exists a vector of covariates denoted by Xi=(Xi,…,Xid)’and let Nik(l) denote the total number of type k events that have occurred up to time t, i=1,…, n, k=1,…, K.It will be assumed that given Xi and a positive latent variable Zi, the marginal mean of Nik(t) has the form E{Nik(t)|Xi,Zi}=μk(t)h(Zi) exp(Xi’β).(22) Here μk(l) is an unknown continuous baseline mean function, h(·) is a completely unspecified positive function, and β=(β1,…,βd)’denotes regression parameters.For each i and k, define Hik(t)=Hik{min(t, Ci)}, where Hik(t)=∑j=1mil Ⅰ(Tik,j≤t).Then Hik(t) is a point process characterizing the observation process on subject i with respect to the kth-type recurrent event. In the following, we will assume that Hik(t) is a counting process whose marginal mean satisfies E{Hik(t)|Xi,Zj}=vk(t)ZigH(Xi)(23) given Xi and Zi. Here vk(t) is a completely unknown continuous baseline mean function and gH(·) is a completely unspecified positive function.Let θ=(θ1,…,θk)’and α=(β’,θ’)’. Also let ek denote the K-dimensional vector of zeros except its kth entry equal to one and Xik=(Xi’ek’)’. To estimate α, we propose to use the following estimating equation where the Wis are some weights that could depend on Xi. Let α=(β,θ)’ denote the estimate of α given by the solution to the equation above.To establish the asymptotic distribution of α, let α0=(β’0,θ’0)’ denote the true value of α and defineThen one can show that under the regularity conditions,√n [α—α0) has an asymptotic normal distribution with mean zero and the covariance matrix (?)-1∑T-1, where∑=E(φiφ’i) with Furthermore, the covariance matrix given above can be consistently estimated by n-1T-1∑T-1, where and with φi=∑k=1K WiXik{Nik—mik exp(Xikα)}.For any regression analysis, a basic question is the appropriateness of the assumed regression model. We will consider the checking of the adequacy of models (22) and (23).Following Lin et al.(2000), define the residual process Rik(t)=∫0t Nik(u) dHik(u)-mik exp(Xi’β)Ak(t), whereMotivated by Lin et al.(2000) and Sun et al.(2007), we consider the statistic It then follows from the method given in Lin et al.(2000) that one can approximate the distribution of φ(t, x) by that of the zero-mean Gaussian process Here (G1,…, Gn) are independent standard normal variables independent of the ob-served data, di is the vector Tφi without the last K entries.To perform the the goodness-of-fit test on models (22) and (23), based on (25), one can first obtain a large number of realizations of φ(t, x) by repeatedly generating the standard normal random sample (G1,…,Gn) given the observed data. Then we can apply the supremum test statistic supt,x|φ(t,x)|and obtain the p-value by comparing the observed value of supt,x|φ(t, x)|to the large number of realizations of supt,x|φ(t,x)|.
Keywords/Search Tags:Integer-valued time series, Empirical likelihood, Signed thinning operator, Multi-variate panel count data, Seamless-L0penalty
PDF Full Text Request
Related items