Analysis of microarray data

Posted on:2006-05-17

Degree:Ph.D

Type:Thesis

University:Yale University

Candidate:Duan, Fenghai

Full Text:PDF

GTID:2450390005495570

Subject:Biology

Abstract/Summary:

PDF Full Text Request

DNA microarray data analysis has been an active statistical topic in recent years due to its vast applications in biomedical fields and complicated data structure. In this thesis, I discuss three different levels of microarray data analysis including normalizing the Affymetrix GeneChipsRTM, identifying significantly differentially expressed (SDE) genes and clustering the similar expressed genes into groups.; In the first section, I am particularly interested in resolving a special spatial effect on the images of certain Affymetrix GeneChipsRTM, which I call texture effect. I further show that the common normalization methods fail to correct the texture effect that in turn affects the identification of differentially expressed genes. To resolve this problem, I explore a way to assess and correct the texture effect by modeling the correlation structure and the periodicity property of the texture effect.; In the second section, I compare the performance of several approaches for identifying differentially expressed genes for the probe-level data of Affymetrix GeneChipRTM. I focus on the comparison between the summarization methods and non summarization methods. For the summarization methods, I first present a theoretical result that reveals the fact that the difference as a result of using MAS 5 (a single-chip approach) versus RMA (a multi-chip approach) actually comes from the mismatch incorporation and the application of different robust algorithms, instead of their "single-chip" and "multi-chip" properties. For the non-summarization methods, I compare the performance between fixed probe-effect modeling and random probe-effect modeling (RPM) in the identification of SDE genes. I show that the fixed probe-effect modeling, together with the summarization methods (MAS 5 and RMA), tend to be over-optimistic in estimating the variances during the identification of the SDE genes. At the same time, random probe-effect modeling performs much better than other methods with respect to the coverage probability from the simulation studies. The Affymetrix Spikein dataset and a mouse data are used to demonstrate the advantage of the random probe-effect modeling.; In the last section, I first show that for the popularly-cited Spellman et al's (1998) yeast cell cycle data, the standard clusterings are deficient due to the existence of the loss of synchrony. I then propose a method to improve the performance of the k-means methods by assigning a decreasing weight on its variable level and evaluating the "weighted k-means" on a simulated dataset and Spellman et al's (1998) yeast cell cycle data. The protein complexes in a public website are used as biological benchmarks. Results show that an exponential decreasing weight function assigned to the variable level of k-means generally increases the agreement between protein complex and k-means clusters.

Keywords/Search Tags:

Data, Microarray, Probe-effect modeling, Texture effect, Summarization methods, K-means

PDF Full Text Request

Related items

1	Machine learning methods for microarray data analysis
2	Morphology-Based Modeling Of Aggregation Effect On The Categorical Raster Data
3	Comparison Of Normalization Methods And Weighted Co-expression Network Analysis Based On MiRNA Microarray Data
4	The Utilization And Mining Of Bioinformational Data In Microarray Technology
5	Research On Summarization Methods For Large RDF Graphs
6	Research On Microarray Data Analysis And Application Of Microarray On Neurobiology
7	Statistical methods in experimental design and analysis of microarray data
8	Study On The Design And Preparation Of Human Parvovirus B19 Microarray
9	Statistical methods for the analysis of genetic marker and microarray data
10	Joint modeling time-to-event and longitudinal data using Markov chain Monte Carlo methods with application to the Proscar(TM) long-term efficacy and safety study