Font Size: a A A

Resampling methods for variable selection and classification: Applications to genomics

Posted on:2002-12-01Degree:Ph.DType:Thesis
University:University of California, BerkeleyCandidate:Fridlyand, Yevgeniya Jane MFull Text:PDF
GTID:2460390011490885Subject:Statistics
Abstract/Summary:
The last decade has been characterized by an explosion of biological sequence information. While in traditional statistical analysis, the number of observations exceeds by far the number of variables in the data, this is no longer the case with many types of data arising in genomics. The challenge is to make sense of the sequence information: revolutionary computational methods for data analysis are sorely needed.; In this thesis variable selection and classification are addressed. This work is divided into two parts, and both parts make extensive use of resampling methods. First, a tree-based method for variable selection is developed. This method is presented in the context of finding quantitative trait loci in mouse experimental crosses but can be applied to the data arising in the human single nucleotide polymorphism (SNP) studies and, eventually, to microarray gene expression data. Secondly, a novel prediction method for determining the number of clusters in the data is introduced, and a bootstrap aggregation method for improving the results of an arbitrary clustering algorithm is presented. The techniques developed in the second part of the thesis are illustrated using microarray data.
Keywords/Search Tags:Variable selection, Data, Method
Related items