Font Size: a A A

Semi-parametric Method For Missing Covariate In Logistic Model And Its Application

Posted on:2020-07-09Degree:MasterType:Thesis
Country:ChinaCandidate:S M FanFull Text:PDF
GTID:2417330599461949Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
The existence of missing data is more common.Reasonable processing of missing data can make help to better use of the information in the data to make decisions.The research on missing data processing methods becomes very practical.In this thesis,the empirical likelihood method is used to estimate the conditional distribution function of the missing part,and the Logistic model with missing covariates is parametrically modeled.In this thesis,the semi-parametric method and several other methods are applied to the actual data in accordance with the Logistic model and the performance of the semi-parametric method is analyzed and verified.The content can be divided into two points:1.To obtain missing data by simulation,one way is fully observable data,the covariate missing data is obtained by simulating different missing rates under the MAR mechanism.The fully observable data is the Pacific Insurance Claims data,setting 2 missing covariates,the simulated missing rates were 5%,10%,15%,20%,25%,30%,35%,40%.The other was the German Credit Evaluation data.8 missing covariates were set.The simulated missing rates were 5%,10%,15%,20%,25%,30%.Another way to simulate full data similar to the original missing data,setting 3 missing covariatesa,simulating three deletion mechanisms based on the full data: MAR mechanism,MCAR mechanism,covariates under the improved NI mechanism Missing data.2.In the Pacific Insurance and German Credit Assessment data with missing simulated covariates,semi-parametric methods and CC,mean interpolation,MI,and EM methods are used for missing processing.The performance of these methods uses three evaluation indicators.The deviation,standard deviation SD,and P value are evaluated.From the perspective of the three indicators,several methods are affected by the missing rate.The semi-parametric method is less sensitive to the missing rate than the other methods.In the data with similar data of alcoholic fatty liver data,the semi-parametric method and CC,regression interpolation,MI,and EM methods were used.The semi-parametric method is more effective from the comprehensive results analysis,the performance of the semi-parametric method under the three mechanisms is superior to other mechanisms in the MAR mechanism.The semi-parametric method and CC are used for the non-alcoholic fatty liver data.The performance of the semi-parametric method is significantly better than CC.Although the semi-parametric method is not very affected by the data missing rate,it is also necessary to assume the correct missing mechanism,which is also limited by the distance from the initial value of the estimated value to the true value.When the dimension of the missing covariate is too large,it also affects the performance of the semi-parametric method.
Keywords/Search Tags:Semiparametric Method, Logistic Model, MAR Mechanism, Pacific Insurance Data, Nonalcoholic fatty liver data, German credit assessment data
PDF Full Text Request
Related items