Regression Analysis Of Misclassified Current Status Data

Posted on:2024-06-22

Degree:Doctor

Type:Dissertation

Country:China

Candidate:W S Wang

Full Text:PDF

GTID:1527307064474754

Subject:Statistics

Abstract/Summary:

PDF Full Text Request

For many years,the research on current status data has been one of the key points of statisticians.By current status data,we usually mean the type of failure time data that arise from a study where each study subject is observed only once and the only observed information is if the failure event of interest has occurred or not at the observation time.In other words,the failure time of interest is either left-or right-censored.More specifically,let denote the failure time of interest and the monitoring or observation time.Then with current status data,the only information available about is either < or with observed.Current status data occur in many fields including econometrics,epidemiology,demography,and tumorigenicity experiments(Huang,1996;Sun,2006;Titman,2017;Zhang et al.,2005).For all of the methods described above,one basic assumption behind them is that the failure status at the observation time can be accurately observed.However,as mentioned above,this may not be true in practice and the resulting data are often referred to as misclassified current status data.One main reason behind this is the use of the tests with imperfect sensitivity and specificity.A more concrete example is the possibility that nucleic acid tests for COVID-19 can go wrong.Note that in the presence of the misclassification,the analysis that ignores it could result in biased or misleading results or conclusions(Gu et al.,2015;Garc(?)a-Zattera et al.,2016).This thesis will mainly consider the regression analysis of misclassified current state data.Note that the previous literature assumes that the failure time of interest and the observation time are independent of each other.Obviously,this assumption may not be true in practice.For example,the time for patients to go to the hospital for physical examination is likely to be related to their physical condition.When and are dependent,the current state data is generally called the current state data with informative observation times,or dependent current state data.However,at present,few scholars have considered the regression analysis of the misclassified current state data with informative observation times,which is also one of the issues considered in this thesis.This thesis will focus on four regression analysis problems related to misclassified current state data.We first discuss regression analysis of current status data with the additive hazards model when the failure status may suffer misclassification in chapter 2.In particular,we propose a nonparametric maximum likelihood approach.For the implementation of the method,a novel EM algorithm is developed,and the asymptotic properties of the resulting estimators are established.Furthermore,the estimated regression parameters are shown to be semi-parametrically efficient.We demonstrate the empirical performance of the proposed methodology in a simulation study and show its substantial advantages over the naive method.Secondly,we discuss the problem of variable selection based on such data in chapter 3.Variable selection is often required in practice and many methods have been proposed for it with various outcomes including failure time variables.Some primitive variable selection techniques include the best subset selection and stepwise regression,but they are computationally expensive and unstable.Due to this,the penalty-based variable selection procedures have been proposed and widely studied.Although many authors have investigated variable selection for various types of failure time data,to the best of our knowledge,it does not seem to exist a method for misclassified current status data.In the following,we will propose a variable selection method for misclassified current status data arising from the proportional hazards model,and in particular,a penalized EM algorithm will be developed to obtain the sparse estimators of regression parameters.Next,we discuss the regression analysis of misclassified current state data with informative observation times in chapter 4.In this study,we introduce an unobserved latent variable to describe the failure time of interest and the observation time,and use the Cox model with latent variable to model the failure time of interest and observation time respectively,and propose an EM algorithm based on Poisson latent variables to maximize the likelihood function.Under certain regularity conditions,we prove the large sample property of the obtained estimators.The numerical results show that the proposed method works well in practiceFinally,we extend the method in the previous chapter to a more general linear transformation model,and discuss the regression analysis of misclassified current state data with informative observation times under this model We still use the maximum likelihood method,and use the EM algorithm to maximize the likelihood function.However,due to the complexity of the model,more latent variables are introduced into the algorithm,and the maximization algorithm is also more complex.We also prove the large sample properties of estimators and test our proposed method by numerical simulation.

Keywords/Search Tags:

Current status data, misclassification, maximum likelihood estimation, EM algorithm, variable selection, semiparametric models

PDF Full Text Request

Related items

1	Semiparametric Regression Analysis Of Current State Data With Misclassificatio
2	Research On Statistical Inference And Related Issues Of High-dimensional Semiparametric Regression Models
3	Robust Inference And Model Selection Methods For Some Semiparametric Models
4	EM Algorithm Parameter Estimation Problem Based On Group Testing
5	Order Restricted Weighted Estimation Method In Mixed Tests
6	Semiparametric Full Likelihood Inference For The Size Of Population From Capture-Recapture Data
7	Statistical Analyses And Applications For Missing Data Based On EM Algorithm
8	Semiparametric Mean Covariance Analysis Of Time Varging-coefficient Single-index Models For Longitudinal Data
9	Variable Selection Of High Dimensional Models With Longitudinal Data
10	Statistical Inference Of Competitive Risk Models Under Complex Censored Data