Font Size: a A A

Discriminant analysis using multi-gene profiles in molecular classification of breast cancer

Posted on:2006-03-23Degree:Ph.DType:Thesis
University:Columbia UniversityCandidate:Yan, XinFull Text:PDF
GTID:2454390008962214Subject:Statistics
Abstract/Summary:
Gene expression data derived from microarrays provide a promising tool for the diagnosis of molecular cancers. However, due to the large dimensions and the complexity of such data, it is challenging to find a reduced set of "informative genes" before a formal classification analysis. In the past few years, many marginal single-gene statistical measures have been applied to expression data despite the fact that gene-gene interactions are non-negligible. In this thesis, in order to capture the interactions among genes, we propose to study methods based on two statistics: "Gene Profile Association Score" (GPAS) and "signed Gene Profile Association Score" (sGPAS). These two statistics are designed to capture high-order gene associations through a similar iterative screening process respectively. Therefore, not only genes with marginal significance, but also those containing interactive information will be detected. We also create linear prediction models for both GPAS and sGPAS, evaluate their performance in a real microarray data with 78 breast cancer patients and compare the results with various existing supervised classification methods. Our proposed statistics empirically outperform all other marginal predictors under a framework of 13-fold cross-validation. In addition, they are able to detect several oncogenes with large p-values, which have not been characterized by marginal feature selection measures. Our findings indicate that GPAS and sGPAS may become very useful methods to explore the complexity of microarray data in the future. These statistics can be applied to general association related high dimensional pattern recognition problems as well. We also provide theoretical proofs for statistical inferences of these scores.
Keywords/Search Tags:Gene, Data, Classification
Related items