Font Size: a A A

A Comparative Study Of Principal Component Analysis And 2D Principal Component Analysis

Posted on:2015-01-29Degree:MasterType:Thesis
Country:ChinaCandidate:J H WuFull Text:PDF
GTID:2207330422467787Subject:Statistics
Abstract/Summary:PDF Full Text Request
Dimension reduction is very important.On one hand, high dimension data can not beused directly in some practical algorithms, while dimension reduction is helpful tosolve the problem, which is called ’dimension curse’, by reducing the complexity.Then some algorithms will work.On the other hand, high dimension data usuallyincludes a great deal of ’noise’ and ’redundancy’.Dimension reduction help people findinteresting data structure in low dimension space for understanding research objectbetter and better.PCA and2DPCA are two different methods for reducing dimension on matrix data.This paper will compare PCA and2DPCA in researching two kinds of importantmatrix data: multivariate time series data and high-frequency financial data. Then wecan use the two kinds of data for reference to practical application.In the classification problem of multivariate time series data, we firstly reducedimension and then classify data in low dimension space. For this purpose, this papercompares combinations of PCA and2DPCA with Euclidean Distance respectivelyabout classification effects. However Euclidean Distance is impacted bydimensionality. So this paper puts forward a new algorithm,which is called "2DPCA’sMahalanobis distance algorithm on two-dimensional principal subspace". And thispaper will compares the new algorithms to Euclidean Distance algorithm and PCA’sMahalanobis distance algorithm by5real world multivariate time series data sets. Theresults show that "2DPCA’s Mahalanobis distance algorithm on two-dimensionalprincipal subspace" is the best combination of dimension reduction method andclassification distance.In the problem of statistical modeling on high-frequency financial data, it’s veryimportant to forecast volatilities of assets. For the high-frequency financial data whichcontains many assets has a covariance matrix of volatilities in every trading day. Thiskind of covariance matrix usually has relatively high dimension. If we forecast thevolatilities directly, it will create a great number of parameters.So firstly we need to reduce dimension and then model low dimension data. This paper make a empiricalcomparative study on PCA and2DPCA, and use6time series models to forecast suchas AR ARMA ARIMA and so on.The results show that low dimension data that getsfrom2DPCA combines aforementioned models has better effects than the lowdimension data that gets from PCA. Also we find that the best combination ofdimension reduction method and model is the combination of2DPCA and VARmodel, which has the minimum average reconstruction error.
Keywords/Search Tags:PCA, 2DPCA, Euclidean distance, Mahalanobis distance, high-frequencydata
PDF Full Text Request
Related items