The Estimation And Application Of Missing Data

Posted on:2004-09-05

Degree:Master

Type:Thesis

Country:China

Candidate:Z L Feng

Full Text:PDF

GTID:2204360122465316

Subject:Epidemiology and Health Statistics

Abstract/Summary:

PDF Full Text Request

Missing data frequently occur in the large databases analyses, such as census, environment-inspected and medical longitudinal studies. It creates many difficulties because most data analytic procedures were not designed for them, for instance, random block analysis, repeated measures analysis, time series analysis, etc. If we handle the missing values with the list-wise deletion, it not only losses much useful information, but also brings about two potentially serious problems: loss of efficiency and bias due to differences between the observed and unobserved data, furthermore, giving misleading results. There arefour traditional methods of dealing with missing data--list-wise deletion,pair-wise deletion, weighting technique and single imputation. The last one includes five types of approaches: mean substitution, hot deck, conditional mean imputation, function imputation, differential residual error. It is the key problem that above methods ignored the main uncertainty of the missing data. In the course of these, I gave the strengths, limitations and basic principle of traditional methods. Meanwhile, I discuss the useful strategy for dealing with data sets with missing values: Multiple Imputation (MI), which had been created by Rubin (1987). The procedure replaces each missing value with a set of plausible values that represent the uncertainty about the right value to impute.It is very important to divide the missing data into three manners: missing at random (MAR), missing not at random (MNAR) and missing completely at random (MCAR). It relates to how to select an appropriate method to impute the missing value. In the paper, I introduced the Bayesian theory, Markov chain Monte Carlo, data augmentation and practical usage of NORM which is used to impute the missing data, and summarize the MI procedure in many statistical soft-wares, and here I used the MI procedure in the SAS.In practice, I compared four results after analyzing four datasets (complete dataset, missing dataset, MI dataset with the NORM imputed and MI datasetwith the SAS imputed) with the same computational programs. It is important for the theoretical research and the application of the MI in medical study.The main conclusion is as follows: traditional method imputed missing data is easy, but it cann't express the uncertainty of the missing value. Furthermore, it increases the sampling error and distorts the distribution, ect. MI is the most popular and systemic method with which to impute the missing data at present. It can make us completely utilize the data information, and expand the application of the method in the medical research. We consider that the MI method is the valuable and important technique for the missing data in the modern analytical and statistical software.

Keywords/Search Tags:

Missing data, Single Imputation, Multiple imputation, Software Application

PDF Full Text Request

Related items

1	Research On Application Of Missing Data Imputation In Medical Field
2	Multiple Imputation For Missing Data That Including A Ratio And Evaluation And Application The Effect Of Intervention That Implement The Secondary Prevention Of Cardiac Rehabilitation
3	Multiple Imputation For The Non-monotone Missing Data And The Application Of Cardiac Rehabilition Comprehensive Intervention Effect Evaluation
4	The Simulation Studies Of Imputation Methods Of Missing Data In The Scale
5	Computer Simulation Of Multiple Imputation For Analyzing Parallel Design And Crossover Design With Missing Data In Clinical Trial
6	Research Of Application Strategy And Imputation Fusion Methodof Missing Datafor Gene Expression Profiling
7	Evaluations And Applications On Several Imputation Approaches Of Integrated Omics Data
8	A Simulated Comparitive Study And Application Of Statistical Methods In Datasets With Missing Values
9	Research And Application Of Missing Data Imputation Method Considering Biological Regulation Mechanism
10	Multiple Imputation And Mixed-effects Model Applied In Longitudinal Data With Missing Data