Font Size: a A A

Semi-parametric Empirical Likelihood Statistical Inference With Case-control Data

Posted on:2022-12-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z ShengFull Text:PDF
GTID:1484306773983709Subject:Sociology and Statistics
Abstract/Summary:PDF Full Text Request
Due to its cost-effectiveness,case-control studies are widely used in many fields such as genetics and epidemiology.For example,case-control studies can be used to explore risk factors affecting rare diseases and detect simultaneously multiple risk factors associated with complex diseases.Auxiliary information such as prevalence information,family information and complex data structure information related to the study of disease are often available in many research fields,in addition to data information such as disease status and exposure factors.For example,the first national health examination survey conducted in the United States from 1960 to 1994 indicated that the prevalence of class II obesity for men between the ages of 50 and 59 was about 0.6%(Flegal et al.,1998).The auxiliary information is underutilized in standard case-control studies,which loses efficiency.The analysis methods which make full use of auxiliary information with case-control data are of great importance for disease research and genetic studies.This thesis focuses on the above related issues and includes the following three aspects.Firstly,this thesis considers the genetic association study under the Probit model.Probit and Logistic regression models are the most popular binary disease statusing in genetic association studies.When analyzing data collected prospectively,these two models have been widely used if there are only fixed effects in the model.However,the likelihood function based on the Logistic mixed-effects model involves highdimensional integration,in the presence of random effects in model.Therefore,likelihoodbased statistical inference is rarely proposed,whereas score method is common.In contrast,the likelihood function under the Probit mixed-effects model does not involve high-dimensional integration and has an closed form.Therefore,likelihood-based methods are convenient and feasible,but not yet available in the literature.We combined prevalence information and the Probit mixed-effects model,proposed four empirical likelihood ratio statistics and proved that the limiting distribution of the empirical likelihood ratio statistic,without a genetic effect,is a mixed chi-square distribution with known weights.Our simulation study indicated that empirical likelihood ratio tests have a remarkable power gain against the popular Logistic-model-based score tests and the disease prevalence information can enhance the power of the empirical likelihood ratio tests.We also analyzed a Kenya malaria data set using the proposed test and found out the gene ABO associated with malaria,which was not significantly in other tests.Secondly,this thesis investigates the genetic association study of a single variant based on family auxiliary information under the Probit model.Family information,i.e.,parental disease status and covariates,such as age and sex,is more readily available than genetic information that can only be obtained by sequencing.The first research of this thesis focuses on the situation where all subjects are independent,ignoring family information in genetic association studies.Single-marker tests are also of importance in genetic association study compared to genetic or pathway-based multiple marker testing.Making full use of family auxiliary information to improve the power of genetic association test is the second focus of this thesis.Based on the family auxiliary information,we constructed empirical likelihood ratio statistics and proved that the limiting distribution is chi-square distribution with degrees of freedom 1,in the absence of genetic associations,regardless of whether the prevalence is known.The simulation results showed that our proposed empirical likelihood ratio test performs better than the score test without using the family auxiliary information.The data is under application to the Biobank and the analysis of real data is not currently available in this section.Thirdly,this thesis studies the empirical likelihood inference for Logistic regression model under two-phase sampling.Both of the above are standard case-control studies and actual study designs are rarely so simple.It is more practical to consider complex sampling designs(e.g.,double sampling).Due to its cost-effectiveness and high efficiency in collecting data,two-phase case-control sampling has been widely used in epidemiology studies.Under the assumption of Logistic regression model,we obtained two density ratio models,further constructed a semi-parametric empirical likelihood framework and finally proposed empirical likelihood ratio method.We showed that the maximum empirical likelihood estimator has an asymptotically normal distribution,and the empirical likelihood ratio follows an asymptotically central chisquare distribution.We found that the maximum empirical likelihood estimator in our method is equal to Breslow and Holubkov(1997)’s.Even so,the limiting distribution of the empirical likelihood ratio statistis,and likelihood-ratio-based interval and test are all new.Furthermore,we constructed new Kolmogorov-Smirnov type goodness-of-fit tests to test the validation of the underlying Logistic regression model.Our simulation results and a real application showed that the likelihood-ratio-based interval and test have certain merits over the Wald-type counterparts and the proposed goodness-of-fit test is valid.Under the standard case-control data,we proposed an empirical likelihood ratio test method based on Probit mixed-effects model using prevalence information and presented an empirical likelihood ratio test method based on the Probit model employing family information.Also,we developed a semi-parametric statistical method based on the Logistic model under two phase sampling.We have not only developed efficient statistical analysis methods for genetic association analysis and epidemiology but also enriched and promoted the development of semi-parametric statistics.
Keywords/Search Tags:Auxiliary information, Bootstrap, goodness-of-fit test, likelihood ratio test, mixed-effects model, Probit model, two-phase case control sampling
PDF Full Text Request
Related items