Research On Statistical Inference Methods For Non-Probability Samples

Posted on:2023-04-06

Degree:Doctor

Type:Dissertation

Country:China

Candidate:J B Yang

Full Text:PDF

GTID:1527306905955059

Subject:Statistics

Abstract/Summary:

PDF Full Text Request

With the changes in people’s lifestyles and rhythms,traditional probability sampling survey methods will face many new problems and challenges,especially the gradual increase in survey costs caused by the sharp drop in the response rate in the survey.With the advancement of science and technology,the popularization of smart devices and Internet technology has fundamentally changed the way of data acquisition.The non-probability sampling method that obtains data through the Internet or other convenient methods has increasingly become the focus of research on sample survey methods.With its high response rate and low cost,it has also been quickly accepted and used by government departments,industries and research institutions.Unlike traditional probability sampling,non-probability sampling often suffer from selection bias because the sampling frame is unknown and the sampling procedure is uncontrolled,that is,population units have different sampling probabilities due to their own conditions.Selection bias is the main cause of non-probability survey coverage error and non-response error.The existence of selection bias makes the composition of the samples relative to the target population unbalanced,and the samples cannot be directly used to infer the target population.Corrections for selection bias are primarily model-based,assuming a relationship between the target variable and auxiliary variables.Under the framework that the relevant auxiliary variables can be obtained from a probability samples,there are three main methods for non-probability samples inference.The first method is a quasi-randomization method,which corrects the selection bias by estimating the propensity score of the non-probability survey data,that is,the sampling probability of the sample unit under the condition of auxiliary variables;the second method is the predictive model method,which uses auxiliary variables in the non-probability samples to model the target variable,and uses auxiliary variables in the probability samples to model the target variable.The target variable is predicted,and finally the overall estimation is carried out through the probability samples;the third method is the double robust method,which combines the above two methods to reduce the error caused by the wrong model setting and improve the robustness of the estimation.In the above three methods,the correct setting of the model is the premise of effective inference of nonprobability samples.This research mainly improves the existing methods under the inference framework of existing non-probabilistic survey data,weakens the model selection,and improves the effectiveness of non-probability samples inference.First,the composition of selection bias in model-based inference of non-probability samples and methods for reducing bias are identified.Secondly,the model is improved on the existing three inference methods:quasi-randomization method,predictive model method and double robust method,which weakens the model selection and improves the accuracy of the estimator.For the quasi-randomization method,a propensity score estimation method based on probability density ratio is proposed,which estimates the propensity score without specifying the propensity score model,and avoids the estimation deviation caused by the wrong specification of the propensity score model.According to the calculated propensity score,an inverse probability weight estimator of the population mean and its variance calculation formula are given,and the asymptotic unbiasedness of the estimator is proved.In addition,for the problem of extreme values of propensity scores when auxiliary variables are unbalanced,the method of weight adjustment in probability samples is introduced into the inference of non-probability samples,which solves the problem of unstable estimators caused by extreme values.For the predictive model method,the Bayesian additive regression trees model in machine learning is introduced into the inference of non-probability samples,which improves the ability to identify interaction terms and high-order terms in the model,and improves the accuracy of the estimator.For the double robust method,the new propensity score estimation method is combined with the predictive model method to further reduce the estimation error caused by the wrong model setting.The asymptotic unbiasedness of the double robust estimator about the population mean and the formula for calculating the variance are also proved.Finally,this paper also uses the Chinese Elderly Health and Longevity Influencing Factors Survey(CLHLS)as the non-probability samples,and the Chinese Family Social Tracking Survey(CFPS)as the relevant probability samples,using the non-probability survey data inference method proposed in this paper to study my country’s 65 The disability status of the elderly population aged 10 years and above is estimated,and the important statistics are estimated and the comparative analysis results between the methods are given.

Keywords/Search Tags:

Non-probability Samples, Density Ratio Estimation, Weight Adjustment, Bayesian Additive Regression Trees, Double Robust Estimator

PDF Full Text Request

Related items

1	Small Area Mean Estimation Based On Density Ratio Model
2	Regression Estimator With Triple Sampling The Double Non-Respondents
3	The Consistency Of Estimator For Probability Density Function Under M-WOD Sample
4	Empirical Likelihood On Density Ratio Models And Small Area Estimation
5	Complete Consistency Of Density Estimation For M-WOD Random Variables
6	Multilinear Model Based On Robust Estimation Methodsand Variable Selection
7	Estimation Of The Number Of Missed People In The Census
8	Robust Small Area Estimation With Density Power Divergences
9	Parameter Estimation And Empirical Breakdown Point Of Robust Mixture Regression Model
10	Study On Robust Estimation And Outlier Detection Based On Multiple Linear Regression