Font Size: a A A

Analysis of microarray data

Posted on:2006-05-17Degree:Ph.DType:Thesis
University:Yale UniversityCandidate:Duan, FenghaiFull Text:PDF
GTID:2450390005495570Subject:Biology
Abstract/Summary:PDF Full Text Request
DNA microarray data analysis has been an active statistical topic in recent years due to its vast applications in biomedical fields and complicated data structure. In this thesis, I discuss three different levels of microarray data analysis including normalizing the Affymetrix GeneChipsRTM, identifying significantly differentially expressed (SDE) genes and clustering the similar expressed genes into groups.; In the first section, I am particularly interested in resolving a special spatial effect on the images of certain Affymetrix GeneChipsRTM, which I call texture effect. I further show that the common normalization methods fail to correct the texture effect that in turn affects the identification of differentially expressed genes. To resolve this problem, I explore a way to assess and correct the texture effect by modeling the correlation structure and the periodicity property of the texture effect.; In the second section, I compare the performance of several approaches for identifying differentially expressed genes for the probe-level data of Affymetrix GeneChipRTM. I focus on the comparison between the summarization methods and non summarization methods. For the summarization methods, I first present a theoretical result that reveals the fact that the difference as a result of using MAS 5 (a single-chip approach) versus RMA (a multi-chip approach) actually comes from the mismatch incorporation and the application of different robust algorithms, instead of their "single-chip" and "multi-chip" properties. For the non-summarization methods, I compare the performance between fixed probe-effect modeling and random probe-effect modeling (RPM) in the identification of SDE genes. I show that the fixed probe-effect modeling, together with the summarization methods (MAS 5 and RMA), tend to be over-optimistic in estimating the variances during the identification of the SDE genes. At the same time, random probe-effect modeling performs much better than other methods with respect to the coverage probability from the simulation studies. The Affymetrix Spikein dataset and a mouse data are used to demonstrate the advantage of the random probe-effect modeling.; In the last section, I first show that for the popularly-cited Spellman et al's (1998) yeast cell cycle data, the standard clusterings are deficient due to the existence of the loss of synchrony. I then propose a method to improve the performance of the k-means methods by assigning a decreasing weight on its variable level and evaluating the "weighted k-means" on a simulated dataset and Spellman et al's (1998) yeast cell cycle data. The protein complexes in a public website are used as biological benchmarks. Results show that an exponential decreasing weight function assigned to the variable level of k-means generally increases the agreement between protein complex and k-means clusters.
Keywords/Search Tags:Data, Microarray, Probe-effect modeling, Texture effect, Summarization methods, K-means
PDF Full Text Request
Related items