Font Size: a A A

Identifying differentially expressed genes, calculating sample sizes and computing gene expression indexes in microarray experiments

Posted on:2004-07-29Degree:Ph.DType:Dissertation
University:The University of North Carolina at Chapel HillCandidate:Hu, JianhuaFull Text:PDF
GTID:1464390011464767Subject:Biology
Abstract/Summary:
The identification of the genes that are most differentially expressed in two-sample microarray experiments remains a difficult problem when the number of arrays is very small. We discuss the implications of using ordinary t statistics and examine other commonly used variants. For multi-probe oligonucleotide arrays, we introduce a simple model relating the mean and variance of expression, possibly with gene-specific random effects. The model is used to obtain maximum likelihood estimates for the degree of differential expression. The estimates have natural shrinkage properties that dispense with the need for ad hoc approaches to guard against inappropriately small variance estimates, and enable comparison to the Cramér-Rao lower bound. It is shown that the estimates are highly efficient, even for small sample sizes. We demonstrate that our approach performs well compared to other proposed approaches in terms of the false discovery rate. We also show that it performs well in terms of a new criterion based on the area under the receiver-operator characteristic function, in which the degree of differential expression is compared among rejected genes vs. non-rejected genes. The likelihood framework suggests straightforward extensions to comparisons of multiple groups, and perhaps further shrinkage steps applied to the likelihood estimates.; Due to the experimental cost and difficulty in obtaining biological materials, it is essential to consider appropriate sample sizes in microarray studies. A few papers have addressed this problem, but none directly based on the false discovery rate (FDR) or related criteria. With the growing use of the FDR in microarray analysis, an FDR-based sample size calculation is essential. Our approach aims to explicitly connect the sample size to the FDR and the number of differentially expressed genes to be detected, using parametric models for degree of differential expression. The applicability of the method is shown through simulations and studies of a real data set. The model-based approach yields FDR estimates comparable to resampling-based methods, and provides greater interpretability.; A widely used type of microarray is the multiprobe oligonucleotide array, with the attractive feature of probe redundancy. The term “expression index” describes a statistic used to represent expression level for a particular gene estimated from raw hybridization intensities on the array. The careful estimation of expression indexes becomes important, because the resulting statistical inferences in an experiment are all based on it. We discuss two approaches for constructing expression indexes using the singular value decomposition (SVD). One approach can adaptively estimate expression indexes from raw intensities of oligonucleotide array probes, and requires a data structure often employed in array experiments. We show that a popular model-based expression index proposed by Li and Wong (2001a) is a special case of our estimates. The other approach uses a data transformation guided by an entropy measure, which itself is computed from the SVD-based estimates. The entropy-based approach can be viewed as a means of improving model fit, thereby improving the estimated expression index. The methods are demonstrated with simulation and applications to two real data sets.
Keywords/Search Tags:Expression, Differentially expressed, Genes, Sample, Microarray, Data, FDR, Estimates
Related items