Font Size: a A A

Impact Of Missing Data On Reliabilit And Validity And Case Analysis

Posted on:2021-04-23Degree:MasterType:Thesis
Country:ChinaCandidate:Y DuFull Text:PDF
GTID:2507306230980149Subject:Master of Applied Statistics
Abstract/Summary:PDF Full Text Request
Questionnaire is currently the most commonly used method for social surveys and scientific research.It is widely used in different kinds of aspects in political,economic,and social life surveys.Its advantage is that it can obtain first-hand survey data and uses simple random sampling to select people for survey.The purpose of questionnaire is that it infers the overall population through samples to certain degree of accuracy,and obtain information about the respondent by designing structured questions.In actual research,however,there are often unknown factors that lead the respondent to keep silence when they answer questions,that is,no answer.The missing data caused by the no answer will affect the statistical inference and get the wrong conclusion.Reliability and validity are important indicators for evaluating the questionnaire’s qualification.These two indicators can investigate whether the questionnaire used in the survey can clearly and intuitively reflect the objectives and significance of the survey,and whether the survey results obtained according to the questionnaire have accuracy and scientificity.Therefore,there is a close relationship between the missing data caused by no answer and the reliability and validity.If the final result of the survey is due to the large proportion of missing data or improper processing methods,which will affect the reliability and validity test results,it is likely to cause deviations in the conclusion of the questionnaire.This article is based on the theory of missing data interpolation and reliability and validity,combined with the questionnaire data of the Kunming Food Safety Satisfaction Survey Project in 2018.Making an absence of data under the MCAR mechanism and the MAR mechanism,and observing three different deletion rates of 10%,30%and 50%,a total of 9 cases,respectively under the deletion mechanism.In each case,Mean value imputation,Randomized imputation,Regression imputation,EM imputation,Multiple imputation,KNN imputation,Decision tree imputation,Random forest imputation,and BP neural network imputation are used to impute missing data,and finally reliability and validity analysis are used to explore the effect of different interpolation methods on reliability and validity in different situations.Through the specific analysis of the actual cases,we find that the optimal interpolation method is slightly different in different missing cases,KNN imputation is the most widely applicable,especially in the case of low missing rate,which is the same as the reliability and validity of original questionnaire data,but it is a pity that the effect of interpolation will gradually decrease as the missing rate increases;at higher missing rates,the effects of Random forest imputation and Multiple imputation exceed KNN imputation gradually and become the optimal Interpolation method.Therefore,we need to clarify the current missing mechanism and missing rate according to the actual situation,and to select the appropriate interpolation method when imputing missing data,so that we can ensure to get the closest to the original complete data Reliability and validity analysis results,even there is missing data.
Keywords/Search Tags:Questionnaire, Missing Data, Imputation method, Reliability, Validity
PDF Full Text Request
Related items