Font Size: a A A

Deriving meaningful structure from spectral embedding

Posted on:2006-11-09Degree:Ph.DType:Dissertation
University:George Mason UniversityCandidate:Higgs, Brandon WFull Text:PDF
GTID:1455390008961882Subject:Mathematics
Abstract/Summary:
The genomics revolution has led to the generation of multiple disparate data types ranging from images with inherent spatial distributions to microarray experiments where the abundance for thousands of mRNA transcripts is simultaneously measured. The objective typically requires extraction of meaningful relationships between data elements across conditions or time, yet the large dimensionality and low sampling rate prevent elucidation of useful information in the original data space. By convention, distance between such data elements is limited to linear metrics. Neither the simple distance metric nor the hyper-dimensionality issues are problems unique to the 'omics: the reduction of high dimensional data into meaningful low dimensional representations is often necessary to clarify important relationships and reveal inherent structure. Non-linear data structures are not accurately represented by strict Euclidean distances, and as such, not optimal for conventional methods of dimensionality reduction. Such methods generally seek to minimize a global cost function, which tends to distort local associations and inaccurately represent the inherent connections between points. In contrast, the spectrum of the Laplacian operator preserves these neighborhood geometries as it learns the data on a low-dimensional manifold or surface. To assess the importance of retaining such information, a modified implementation of this method is demonstrated on both gene expression and image data. In addition, we investigate the outcome when applying modified Laplacian and Laplace-Beltrami operators on biological data embedded in a spatial frame, by examining mRNA localization patterns from in situ hybridization images across mouse brain slices. The results are particularly relevant in this type of complex data structure, where most of the relationships between points are unknown. As part of the study we explored the optimal parameter space in order to best summarize gene localization patterns and biological function as edges on a weighted graph. Overall, we find that the spectral properties of the Laplacian have particular applicability to image data, where we can readily solve for the implicit ordering between images, and gene expression data as judged by the ability to both classify and cluster points of known disease type and biological function, and the ability to render a meaningful embedding.
Keywords/Search Tags:Data, Meaningful, Structure
Related items