Font Size: a A A

Impute Missing Values For Mixed Data

Posted on:2014-12-09Degree:MasterType:Thesis
Country:ChinaCandidate:X L HuFull Text:PDF
GTID:2250330425973666Subject:Applied Economics
Abstract/Summary:PDF Full Text Request
The presence of missing data is a recurrent problem in the actual analysis. Most of the statistical methods can not be applied directly on incomplete data, which greatly reduces the statistical value. Commonly used method to impute are for continuous variables or classification variables. This paper discuss the method for the mixed data.At first this paper introduces the principal component analysis, principal component analysis of mixed data (PCAMIX) and the method to impute single value with PCAMIX Then we give the principal component analysis method to choose the dimension based on the cross-validation (cross-validation). Finally, taking into account the variability of the missing values, we give the idea of multiple imputation.The main work of the paper is summarized as follows. First based on the cross-validation to choosing the dimension of principal component theory we can conclude the cross-validation to choosing the dimension of PCAMIX. Then based on the idea of multiple imputation we can conclude the multiple imputation method of PCAMIX. At last we explore the properties of PCAMIX which contents the single value imputation and multiple imputation. In the respect of single value imputation, we compare the effect of variables impute separately with impute together. And we can get the influence of imputation leaded by SNR and correlation coefficient. In the respect of multiple imputation:we explore the influence of data obtained by multiple imputation by data structure and missing rates with projection. And the process of multiple imputation is explained in the Rubin Summary rule.The conclusion is summarized as follows. In the respect of single value imputation:the error increases when the missing rates increases; Continuous and categorical variables imputed together was better than imputed separately; Generally the greater the signal-to-noise ratio, the effect of imputation is better; The greater the correlation coefficient between the variables, the effect of imputation is better. In the respect of multiple imputation:When the data structure is strong, the got data are more closely and reliable; When the missing rates is small, the got data are more closely and reliable.
Keywords/Search Tags:missing data, mixed data, principal component analysis ofmixed data, choice of the number of dimensions, multiple imputation
PDF Full Text Request
Related items