Font Size: a A A

Feature extraction and dimension reduction with applications to classification and the analysis of co-occurrence data

Posted on:2002-07-06Degree:Ph.DType:Dissertation
University:Stanford UniversityCandidate:Zhu, MuFull Text:PDF
GTID:1468390011499441Subject:Statistics
Abstract/Summary:PDF Full Text Request
The Internet has spawned a renewed interest in the analysis of co-occurrence data. Correspondence analysis can be applied to such data to yield useful information. A less well-known technique called canonical correspondence analysis (CCA) is suitable when such data come with covariates. We show that CCA is equivalent to a classification technique known as linear discriminant analysis (LDA). Both CCA and LDA are examples of a general feature extraction problem.; LDA as a feature extraction technique, however, is restrictive: it can not pick up high-order features in the data. We propose a much more general method, of which LDA is a special case. Our method does not assume the density functions of each class to belong to any parametric family. We then compare our method in the QDA (quadratic discriminant analysis) setting with a competitor, known as the sliced average variance estimator (SAVE). Our study shows that SAVE over-emphasizes second-order differences among classes.; Our approach to feature extraction is exploratory and has applications in dimension reduction, automatic exploratory data analysis, and data visualization. We also investigate strategies to incorporate the exploratory feature extraction component into formal probabilistic models. In particular, we study the problem of reduced-rank non-parametric discriminant analysis by combining our work in feature extraction with projection pursuit density estimation. In the process, we uncover an important difference between the forward and backward projection pursuit algorithms, previously thought to be equivalent.; Finally, we study related mixture models for classification and show there is a direct connection between a particular formulation of the mixture model and a popular model for analyzing co-occurrence, data known as the aspect model.
Keywords/Search Tags:Data, Featureextraction, Co-occurrence, Classification, LDA
PDF Full Text Request
Related items