Generalized statistical methods for mixed exponential families

Posted on:2010-08-15

Degree:Ph.D

Type:Dissertation

University:University of California, San Diego

Candidate:Levasseur, Cecile

Full Text:PDF

GTID:1440390002476258

Subject:Engineering

Abstract/Summary:

This dissertation considers the problem of learning the underlying statistical structure of complex data sets for fitting a generative model, and for both supervised and unsupervised data-driven decision making purposes. Using properties of exponential family distributions, a new unified theoretical model called Generalized Linear Statistics is established.;The complexity of data is generally a consequence of the existence of a large number of components and the fact that the components are often of mixed data types (i.e., some components might be continuous, with different underlying distributions, while other components might be discrete, such as categorical, count or Boolean). Such complex data sets are typical in drug discovery, health care, or fraud detection.;The proposed statistical modeling approach is a generalization and amalgamation of techniques from classical linear statistics placed into a unified framework referred to as Generalized Linear Statistics (GLS). This framework includes techniques drawn from latent variable analysis as well as from the theory of Generalized Linear Models (GLMs), and is based on the use of exponential family distributions to model the various mixed types (continuous and discrete) of complex data sets. The methodology exploits the connection between data space and parameter space present in exponential family distributions and solves a nonlinear problem by using classical linear statistical tools applied to data that have been mapped into parameter space.;One key aspect of the GLS framework is that often the natural parameter of the exponential family distributions is assumed to be constrained to a lower dimensional latent variable subspace, modeling the belief that the intrinsic dimensionality of the data is smaller than the dimensionality of the observation space.;The framework is equivalent to a computationally tractable, mixed data-type hierarchical Bayes graphical model assumption with latent variables constrained to a low-dimensional parameter subspace. We demonstrate that exponential family Principal Component Analysis, Semi-Parametric exponential family Principal Component Analysis, and Bregman soft clustering are not separate unrelated algorithms, but different manifestations of model assumptions and parameter choices taken within this common GLS framework. Because of this insight, these algorithms are readily extended to deal with the important mixed data-type case. This framework has the critical advantage of allowing one to transfer high-dimensional mixed-type data components to low-dimensional common-type latent variables, which are then, in turn, used to perform regression or classification in a much simpler manner using well-known continuous-parameter classical linear techniques.;Classification results on synthetic data and data sets from the University of California, Irvine machine learning repository are presented.

Keywords/Search Tags:

Data, Statistical, Exponential, Mixed, Linear, Generalized, Model

Related items

1	Mixed-type Data Monitoring Using Generalized Linear Model-Adjusted Scheme
2	Estimation And Tests In Linear Mixed Models
3	Constrained Statistical Inference in Generalized Linear, and Mixed Models with Incomplete Data
4	Validation Of The Estimation Of Variance Components And Generalized P Values In The Linear Mixed Model
5	Statistical Analysis Method Of Large Data Based On Linear Mixed Model And Its Application
6	Bayesian Analysis Of A Class Of Generalized Partial Linear Mixed Effect Model For Longitudinal Data
7	Research On Several Issues Of Experimental Design Of Generalized Linear Mixed Effects Model
8	Empirical Likelihood Estimation For Partially Linear Mixed Effects Model With Longitudinal Data
9	Relationship Of BLUPs For Linear Functions Under Mixed Linear Models
10	Statistical Inference For Several Functional Mixed Effects Models