Font Size: a A A

The Estimation And Application Of Missing Data

Posted on:2004-09-05Degree:MasterType:Thesis
Country:ChinaCandidate:Z L FengFull Text:PDF
GTID:2204360122465316Subject:Epidemiology and Health Statistics
Abstract/Summary:PDF Full Text Request
Missing data frequently occur in the large databases analyses, such as census, environment-inspected and medical longitudinal studies. It creates many difficulties because most data analytic procedures were not designed for them, for instance, random block analysis, repeated measures analysis, time series analysis, etc. If we handle the missing values with the list-wise deletion, it not only losses much useful information, but also brings about two potentially serious problems: loss of efficiency and bias due to differences between the observed and unobserved data, furthermore, giving misleading results. There arefour traditional methods of dealing with missing data--list-wise deletion,pair-wise deletion, weighting technique and single imputation. The last one includes five types of approaches: mean substitution, hot deck, conditional mean imputation, function imputation, differential residual error. It is the key problem that above methods ignored the main uncertainty of the missing data. In the course of these, I gave the strengths, limitations and basic principle of traditional methods. Meanwhile, I discuss the useful strategy for dealing with data sets with missing values: Multiple Imputation (MI), which had been created by Rubin (1987). The procedure replaces each missing value with a set of plausible values that represent the uncertainty about the right value to impute.It is very important to divide the missing data into three manners: missing at random (MAR), missing not at random (MNAR) and missing completely at random (MCAR). It relates to how to select an appropriate method to impute the missing value. In the paper, I introduced the Bayesian theory, Markov chain Monte Carlo, data augmentation and practical usage of NORM which is used to impute the missing data, and summarize the MI procedure in many statistical soft-wares, and here I used the MI procedure in the SAS.In practice, I compared four results after analyzing four datasets (complete dataset, missing dataset, MI dataset with the NORM imputed and MI datasetwith the SAS imputed) with the same computational programs. It is important for the theoretical research and the application of the MI in medical study.The main conclusion is as follows: traditional method imputed missing data is easy, but it cann't express the uncertainty of the missing value. Furthermore, it increases the sampling error and distorts the distribution, ect. MI is the most popular and systemic method with which to impute the missing data at present. It can make us completely utilize the data information, and expand the application of the method in the medical research. We consider that the MI method is the valuable and important technique for the missing data in the modern analytical and statistical software.
Keywords/Search Tags:Missing data, Single Imputation, Multiple imputation, Software Application
PDF Full Text Request
Related items