Font Size: a A A

Confidence Intervals And Sample Size Determination For Proportion Difference Based On Partially Validated Series With Two Fallible Classifiers

Posted on:2022-04-23Degree:MasterType:Thesis
Country:ChinaCandidate:L M WangFull Text:PDF
GTID:2480306335984069Subject:Statistics
Abstract/Summary:PDF Full Text Request
In biomedical research,the research on the prevalence of a certain disease in the population is of great significance.Under normal circumstances,people first use cheap screening methods to diagnose subjects,but the screening methods will be misclassified and lead to incorrect conclusions;although the gold standard is not misclassified,it is usually expensive.Therefore,to overcome the shortcomings of the two,the double sampling method is often used to diagnose and classify the subjects.Double sampling is to use screening methods to classify all individuals first,and then randomly select some individuals from them to use the gold standard to classify them.Since individuals who have undergone rediagnosis by the gold standard can reflect their true information,the data obtained by double sampling is also called partially validated series.Since in reality,there is basically no gold standard that is completely correct,the data obtained through double sampling is partially validated series without the gold standard.Based on the partially validated series that both classifiers have misjudgment,this paper studies the confidence interval construction of the difference in disease prevalence(that is,the proportion difference)and the sample size determination problem under the control of the interval width.First,two double-sampling models are considered,namely model one that meets conditional independence and model two that does not satisfy conditional independence.Under these two models,based on the method of variance estimates recovery(ie MOVER),the Wald confidence interval based on the difference in disease prevalence(proportion difference),Log transformation confidence interval,Logit transformation confidence interval,Agresti-Coull confidence interval,Score There are twelve interval estimation methods including confidence interval and likelihood ratio confidence interval,Bootstrap resample confidence interval and Bayesian confidence interval.The simulation studies the empirical coverage probability of these confidence intervals,the empirical coverage width,and the ratio of the mesial non-coverage probability to the non-coverage probability.The simulation results show that: 1)For model 1,when the sample is small,the confidence interval based on Wald confidence interval,Log transformation confidence interval,Score confidence interval,Bootstrap quantile confidence interval and MOVER method has better performance;With the increase of sample size,except for Bootstrap normal approximate confidence interval and Bootstrap quantile-t confidence interval,the confidence intervals constructed by all other methods have satisfactory results,so they are recommended for practical applications.2)For model 2,except for the Bootstrap quantile-t confidence interval,which is a bit conservative,the rest of the confidence intervals have satisfactory performance,so they are also recommended for practical applications.Secondly,from the perspective of the width of the confidence interval,this paper studies the problem of determining the sample size of the proportional difference,and proposes that the Wald confidence interval,the Score confidence interval,and the likelihood ratio confidence interval width are controlled within a specified range under a given confidence level.Estimation formula of sample size.The simulation studies the empirical coverage probability and empirical coverage width of the confidence interval under the estimated sample size.The simulation results show that: 1)The sample size required by the various methods under the same parameter settings is usually greater than the sample size required by the corresponding methods under the model two;2)Under the two models,under the estimated sample size,the confidence The empirical coverage probability of the interval is very close to the given confidence level and the empirical coverage width is close to the set width.Therefore,it is recommended to be used in practical applications.Finally,the effectiveness of the method proposed in this paper is verified through the analysis of malaria data.
Keywords/Search Tags:Partially validated data with fallible classifiers, Method of variance estimates recovery, Confidence intervals, Interval width, Sample size
PDF Full Text Request
Related items