Font Size: a A A

Research And Application Of Principal Component Analysis Method Based On Interval Number Theory

Posted on:2016-06-04Degree:MasterType:Thesis
Country:ChinaCandidate:Z P HouFull Text:PDF
GTID:2270330473460251Subject:Mathematical Statistics
Abstract/Summary:PDF Full Text Request
In solving many practical problems, we often have to analyze and study the data of large and varied sample, i.e. the processing of high-dimensional data. Overlapping information among the data is hard to avoid and its major factors are also hard to be grasped, thus causing great challenge for the analyzing process. If single analysis is given to each variable, the result would not be comprehensive; if certain variables are blindly reduced, the result would not be accurate, with the possibility of necessary information being lost. Therefore, in the process of quantitative analysis, it’s always hoped that features of things can be reflected by means of the useful information, i.e. reduce the dimension of high-dimensional data while not lose necessary information. And principal component analysis serves as an effective solution to this problem.Principal component analysis is a mature method for information extraction or reduction of data dimension. It substitutes the known variables with a set of variables that has lower dimension and are uncorrelated to each other by means of linear transformation. In the process, information loss is reduced to its minimum. Since linear transformation does not change the total variance, the variables can then be sorted in the order of variance decreasing so as to form a principal component. So in analyzing the data of large sample, we only have to analyze little information and can grasp the major factors, thus simplifying the computing process and improving its efficiency. The principal component analysis is oriented only to real samples, but certain sample would generate an error in the measurement process and the sample itself has uncertainties, causing that the result cannot be represented by a pure number. To solve this problem, many experts have put forward the concept of interval number and expanded the application of principal component analysis to interval principal component analysis.So far, the method of interval principal component analysis can be roughly divided into three categories:V-PCA, C-PCA, and MR-PCA, describing interval number by using the interval endpoint or midpoint, and the radius. When interval data is prematurely approximated to a real number, some useful information will be lost in the analysis process, and the accuracy and reliability of the results will be reduced. This paper is going to propose two new ways from the viewpoint of the interval sample matrix based on the covariance matrix (correlation matrix). First, it introduces the method of principal component analysis on the basis of experience descriptive statistics, regarding interval sample as the two dimensional variable computing of the interval matrix respectively. It obtains experience descriptive statistics through joint density distribution, and then obtain eigenvalue and unit orthogonal eigenvector. It will become interval unit eigenvector by a certain operation, and we can get principal components finally. The other is the the method of principal component analysis based on interval matrix, using interval matrix to calculate covariance matrix (correlation matrix) and also Deif method to calculate the eigenvalue and eigenvector of interval matrix to get the principal components. In the end, examples show that the improved method has the feature of less calculating amount and higher cumulative contribution rate, and can provide better explanation of the principal component. So the desired effect of this paper has been accomplished.
Keywords/Search Tags:Variance, Principal Component Analysis, Interval Number, Principal Component
PDF Full Text Request
Related items