Font Size: a A A

A Study On The Nonresponse Of Sample Survey

Posted on:2022-09-13Degree:DoctorType:Dissertation
Country:ChinaCandidate:S L HaoFull Text:PDF
GTID:1487306317992849Subject:Statistics
Abstract/Summary:PDF Full Text Request
The non-response problem in sample surveys is one of the important factors affecting survey reliability and data quality.Whether it is face-to-face surveys such as door-to-door interviews or street interception,or online surveys based on Internet media,the respondent's non-response behavior in the face of questionnaires is affected by multiple sources of complex factors.In sampling surveys,it is not only necessary to adopt scientific sampling plans and techniques in the design stage to reduce sampling errors,but also to pay full attention to the far-reaching impact of the respondent's non-response in the survey implementation stage.The respondent's non-response to the questionnaire or question mainly comes from two sources,one is random non-response,and the other is subjective non-response.Random non-response can be actively improved and circumvented through sampling design,while subjective non-response may come from the respondent's sensitivity to the subject of the survey,awareness of privacy protection,or the value judgment of the survey itself.At the same time,external factors such as the complexity of the sample group,the mobility of the sample unit,and the survey environment also have a greater impact on the non-response problem.Therefore,it is obviously not enough to use traditional indicators such as questionnaire recovery rate and response rate to evaluate the quality of sample surveys and data quality.The sample survey from the floating population is typical.The floating population is a special group in the background of China's fixed household registration system,and has made significant contributions to China's economic and social development.The sampling survey of the floating population is affected by complex socio-economic factors,and the problem of non-response is more prominent,leading to inevitably missing survey data and affecting data quality.The paper takes the data item nonresponse problem in the sample survey of the floating population as the starting point,deeply analyzes the non-response problem in the sample survey,analyzes the cause of the subjective non-response,and elaborates the the mechanism of data missing and the method of data imputation.Taking the floating population survey data as the research object,the representative response in the floating population sample survey were measured to achieve the purpose of evaluating the quality of the survey,at the same time,the paper conducts in-depth research on how to deal with the item nonresponse problem in the floating population sample survey.The second chapter is the logical starting point of this article.It analyzes the mechanism of data missing caused by non-response problem in sampling survey in detail,and provides a theoretical basis for the research of nonresponse problem.The third chapter is the research on the treatment method of item nonresponse,which provides methodological support for the treatment of the item nonresponse problem.The fourth chapter describes the plan design of the sampling survey of the floating population in China,which is the basis for studying the problem of item nonresponse in the sampling survey of the floating population,and reflecting on the reasons for the item nonresponse from the perspective of sampling design.The fifth chapter is the empirical research of the item nonresponse imputation in the floating population survey.It measures the item nonresponse problem in the floating population sample survey from an empirical point of view.And the data missing problem in the sampling survey of the floating population is imputed based on the single imputation method,multiple imputation method and structural logic imputation method.Chapter six is the relevant conclusions,policy recommendations and research prospects of the research on the item nonresponse in the sampling survey of floating population.In this paper,the non-response problem in the sample survey and data imputation methods have been mainly done in the following four aspects:First,it deeply explored the mechanism of data missing caused by non-response problems in sample surveys.The respondent's non-response behavior in the survey will inevitably lead to data missing.To solve the problem of data missing,the first step is to identify the data missing mechanism that caused the respondents to not respond.The paper analyzes in detail missing-completely-at-random(MCAR),missing-at-random(MAR)and non-missing-at-random(NMAR)three data missing mechanisms,which is the theoretical basis for the subsequent research on the item nonresponse problem.The consequence of nonresponse is to make the population feature estimator biased,the paper combines nonresponse with sampling theory to explore the impact of nonresponse on the population feature estimator.Second,the paper improves the theoretical basis and application methods of R statistics for sampling survey quality evaluation,designs the R algorithm for sampling survey quality evaluation,and constructs a new type of sampling survey evaluation system.Regarding the discussion on the quality of sampling surveys,the existing literature mainly focuses on the use of response rates to evaluate the quality of sampling surveys.The higher the response rate,the stronger the representativeness of the response sample.However,related studies have shown that there is no inevitable relationship between the response rate and the representativeness of the response sample,and R statistics can measure the representativeness of the response sample,and can describe and estimate the quality of the survey at a more in-depth and detailed question level.This paper regards the R statistics as an important supplementary indicator of the response rate,expands the R statistics and partial R statistics for measuring the response representativeness and constructes the standard error and confidence interval of the R statistics.Finally the empirical analysis is carried out based on the dynamic monitoring data of China's floating population in 2017,which improves the landing and application of the R statistics.The paper also writes a computer program to realize the R statistics and the partial R statistics.It is worth noting that the R statistics and the partial R statistics representing the representative response are a unified integration of the existing literature.The calculation of the standard error and confidence interval of the R statistics is an important supplement to the R statistics measure completed in this paper.Thirdly,a systematic study was carried out on the processing methods of item nonresponse.Imputation is a common method for dealing with item nonresponse.First,the theoretical mechanism of imputation method is analyzed in detail,then the existing single imputation,EM imputation,multiple imputation and fractional imputation methods are systematically compared.The applicable conditions,advantages and disadvantages of the existing imputation methods are discussed.Finally the paper proposes the structural logic imputation method.The structural logic imputation method designed in this paper is a comprehensive imputation method that includes categorical imputation,associative imputation and multiple optimal imputation.The structural logic imputation method is a problem-oriented imputation method.The structural logic imputation method is a method of sampling learning,through the algorithm in machine learning,the classification rules,association rules,multiple optimization rules are learned.More specifically,through sampling learning methods,individual groups characteristics,statistical characteristics and individual behavior characteristics are learned,furthmore imputation methods for missing samples based on these characteristics are performed.This paper constructs a complete system of item nonresponse imputation based on this theory.Fourth,from an empirical point of view,the problem of item nonresponse in the floating population survey is measured,so as to evaluate the impact of nonresponse on the quality of the floating population sample survey.Furthmore based on the single imputation method,the multiple imputation method and the structural logic imputation method,the missing data influencing the floating population ' willingness to stay is imputed,and evaluate the effects of various imputation methods from different perspectives.Specifically,besides using the traditional variance comparison method,the method of comparing the imputed data with the original data is used to evaluate the effects of various imputation methods.This paper proposes to use Kappa consistency analysis in structural logic imputation analysis to compare the advantages and disadvantages of various imputation methods.The nonresponse problem in the sample survey directly leads to data missing in different degrees,which affects the quality of the sample survey in different degrees.The innovation of this paper is to improve the theoretical basis and algorithm design of the R statistics,use R statistics as a supplementary indicator of response rate to assess the nonresposne problem of the floating population sample survey,thereby evaluating the quality of sampling surveys.Moreover,the multiple imputation method has been improved.Finally a new structural logic imputation method is proposed to impute and evaluate the item nonresponse problem in the sampling survey.
Keywords/Search Tags:sampling survey, nonresponse, R statistics, structural logic imputation, floating population, empirical analysis
PDF Full Text Request
Related items