Font Size: a A A

PCA Of Mixed Data

Posted on:2017-05-11Degree:MasterType:Thesis
Country:ChinaCandidate:X ZhangFull Text:PDF
GTID:2297330503466665Subject:statistics
Abstract/Summary:PDF Full Text Request
There is a large number of qualitative and quantitative data in the statistical surveys, in order to use a smaller number of indicators to analyze, we can choose principal component analysis. However, in traditional principal component analysis, the data are supposed to meet the normality assumption and all variables are marked on the numeric level, so the result will be valid. Obviously, mixed data does not meet these conditions, it will lead to wrong conclusions. Therefore, it is necessary to study the principal component analysis of mixed data.This study is based on principal component analysis, combing with data transformation, correlation feature extraction and centered version of the indicator matrix. By using the qualitative data preprocessing, polyserial and polyschor correlation, GSVD and so on,,the principal component is suitable for the analysis of mixed data. Moreover, these methods are respectively applied to actual statistical survey data of Gironde which include 11 qualitative variables and 16 quantitative variables with R software. The results show that the effect of dimension reduction is better by using the characteristic matrix. With the principal components transformed by GSVD and MCA comparing with raw data, the result of clustering analysis is more in line with the level of social development in urban areas of Gironde.These methods were compared according to their complexity and their ability of explaining results. This study shows that data transformation can be used to solve the quantitative problem of qualitative data; the new calculation method of characteristics can calculate the mixed data correlation, which conventional correlation coefficient s can’t; the GSVD and MCA method can handle principal component extraction of mixed data. The conclusions of this study provide a reference for reducing the dimension of the mixed data in the statistical survey by using principal component analysis and further analysis, such as cluster analysis.
Keywords/Search Tags:Mixed Data, Principal Component Analysis, Characteristic Matrix, Multiple Correspondence Analysis
PDF Full Text Request
Related items