Font Size: a A A

Imputation Methods Of Missing Values For Compositional Data

Posted on:2017-05-21Degree:MasterType:Thesis
Country:ChinaCandidate:M WangFull Text:PDF
GTID:2310330512451003Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
Compositional data is a kind of complex multivariate data which mainly used to study the components of a whole.In recent years,compositional data is widely used in many fields such as geology,society and economy.No answer in the survey and some mistakes in the data collection can lead to missing data.The missing value can affect the quality of statistical data and increase the estimated variance,so the missing data reduce the persua-sion of statistical research.It is is extremely important to handle the missing data.This article mainly introduce the imputation methods of missing data for compositional data,the principal component regression imputation method(PCA)is introduced to solve the compositional dataset with the multi-collinearity and the imputation method with princi-pal component analysis based on the minimum covariance determinant(MCD)estimator(MPCA)is proposed when the compositional dataset contains outliers data.This paper is divided into five chapters:The first chapter introduces the background and significance of the compositional data and summarizes the present situation of missing value.The second chapter introduces the definition and algorithm of the compositional data.Three kinds of Logratio transformation and a hyperspherical transformation are introduced.It also detailedly describes some simple imputation methods of missing values for general data and compositional data.The third chapter develops two imputation methods,one is the mean on the simplex space and another is the principal component regression imputation methods in order to solve the nulti-collinearity phenomena.It is proved to be preferable through a real and simulated data sets.The fourth chapter further introduces a imputation method with robust principal com-ponent analysis based on the minimmm covariance determinant(MCD)estimator(MPCA).on the basis of the third chapter.It is proposed to deal with the missing value of the com-positional data containing outliers.The proposed method is tested on a real example and on simulation researches.The results show that the proposed method outperform other imputation method which the compositional data is in the presence of outliers.The fifth chapter summarizes the research work of this article,and puts forward the deficiencies and problems of the research to be solved.
Keywords/Search Tags:compositional data, missing values, mean imputation on the simplex space, principal component analysis imputation, minimum covariance determinant estimator
PDF Full Text Request
Related items