Font Size: a A A

Based On The Multivariate T Distribution Of Probabilistic Principal Component Analysis And Its Applications

Posted on:2003-03-20Degree:MasterType:Thesis
Country:ChinaCandidate:J H ZhaoFull Text:PDF
GTID:2190360092985939Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
Principal component analysis (PCA) is a popular technique in dimension reduction. Since the scope of its application is limited by its global linearity, several generalizations are proposed recently in the literature. The probabilistic PCA, introduced by Tipping and Bishop [39, 40] and termed as Gaussian-PPCA here, is a particularly important one of such effort. In this thesis, we derive probabilistic PCA, which we call it t-PPCA, for data sampled from the finite mixture of multivariate t-distributions, whereby we obtain a new general-purpose dimension reduction algorithm which is of some importance in both theory and application. Our main contributions are summarized as what follows:? Theoretical: Suppose data were sampled from a mixture of m d-variate t-distributions; and each mixing component satisfies an isotropic factor analysis model (see ?.2.1). In chapters 3 and 4, we derived the maximum likelihood estimation of the model parameters by using the EM type algorithm, which provides us a new data projection and reconstruction algorithm, i.e., t-PPCA. When the 'degree of freedoms' v = oo, t-PPCA is exactly the Gaussian-PPCA. In the case m = 1, the projection is given by the principal component decomposition of a matrix S' (see ?.1), which is reduced to the sample covariance matrix S only when v = oo. This shows that the usual PCA is adequate only for normal data.? Applications: As data is modelled as finite mixture of t-distributions, t-PPCA is robust in application. Comparing with the use of Gaussian distribution, our approach does give better results, as is shown by the experiments in chapter 5. The experiment on handwritten English letter recognition in ?.1 shows that the error rate of t-PPCA is significantly smaller than that of Gaussian-PPCA (see Table 5.1). We recognized from both tables that data projection may be necessary before classification. This phenomenon is left to the future study. In the experiment on data compression in ?.2, by comparing Fig. 5.2 with 5.3, the quality of the constructed image given by t-PPCA is evidently better than that via Gaussian-PPCA.
Keywords/Search Tags:probabilistic PCA, image compression, handwritten letter recognition, mix-ture model, EM algorithm, t distribution
PDF Full Text Request
Related items