Font Size: a A A

Unsupervised learning: An information theoretic framework

Posted on:2009-08-08Degree:Ph.DType:Thesis
University:University of FloridaCandidate:Rao, Sudhir MadhavFull Text:PDF
GTID:2448390002996928Subject:Engineering
Abstract/Summary:PDF Full Text Request
The goal of this research is to develop a simple and unified framework for unsupervised learning. As a branch of machine learning, this constitutes the most difficult scenario where a machine (or learner) is presented only with the data without any desired output or reward for the action it takes. All that the machine can do then, is to somehow capture all the underlying patterns and structures in the data at different scales. In doing so, the machine has extracted maximal information from the data in an unbiased manner, which it can then use as a feedback to learn and make decisions.;Till now, the different facets of unsupervised learning namely, clustering, principal curves and vector quantization have been studied separately. This is understandable from the complexity of the field and the fact that no unified theory exists. Recent attempts to develop such a theory have been mired with complications. One of the issues is the imposition of models and structures on the data rather than extracting them directly through self organization of the data samples. Another reason is the lack of non-parametric estimators for information theoretic measures. Gaussian approximations generally follow, which fail to capture the higher order features in the data that are really of interest.;In this thesis, we present a novel framework for unsupervised learning called - The Principle of Relevant Entropy. By using concepts from information theoretic learning, we formulate this problem as a weighted combination of two terms - an information preservation term and a redundancy reduction term. This information theoretic formulation is a function of only two parameters. The user defined weighting parameter controls the task (and hence the type of structure) to be achieved whereas the inherent hidden scale parameter controls the resolution of our analysis. By including such a user defined parameter, we allow the learning machine to influence the level of abstraction to be extracted from the data. The result is the unraveling of "goal oriented structures" unique to the given dataset.;Using information theoretic measures based on Renyi's definition, we estimate these quantities directly from the data. One can derive fixed point update scheme to optimize this cost function, thus avoiding Gaussian approximation altogether. This leads to interaction of data samples giving us a new self organizing paradigm. An added benefit is the lack of unnecessary parameters (like step size) common in other optimization approaches. The strength of this new formulation can be judged from the fact that the existing mean shift algorithms appear as special cases of this cost function. By going beyond the second order statistics and dealing directly with the probability density function, we effectively extract maximal information to reveal the underlying structure in the data. By bringing clustering, principal curves and vector quantization under one umbrella, this powerful principle truly discovers the underlying mechanism of unsupervised learning.
Keywords/Search Tags:Unsupervised learning, Information theoretic, Data
PDF Full Text Request
Related items