Font Size: a A A

Combining DNA microarray data to improve the stability and accuracy of linear discriminant analysis

Posted on:2009-10-02Degree:M.EType:Thesis
University:The Cooper Union for the Advancement of Science and ArtCandidate:Tan, BijunFull Text:PDF
GTID:2440390005456719Subject:Engineering
Abstract/Summary:
A DNA microarray is a small chip designed to indirectly measure the expression levels of thousands of genes at once. In this thesis, three datasets containing microarray data of breast cancer patients have been analyzed. Two popular supervised classification methods, k nearest neighbor (k-NN) and linear discriminant analysis (LDA), have been trained on these datasets in order to predict the estrogen receptor status of the patients. In particular, LDA is observed to be unstable when the number of genes exceeds the number of samples. The problem is that the covariance matrix used in LDA is poorly estimated when there is a large number of features and small number of samples, which is typical of individual microarray datasets. Thus, the combination of data from independent but similar studies has been experimented with in order to improve classification results. Simple normalization methods such as dividing by the norm of the feature vector have been used to make the datasets more comparable. The results have shown that the normalized and combined data improve the stability of LDA in all three cases. The accuracy of the classifier is also improved in some cases. The findings clearly indicate that despite the differences across studies in experimental design, it is possible to improve classification performance by combining data. This implies that there should be more standardization in the field in order to achieve more powerful analysis.
Keywords/Search Tags:Data, Microarray, Improve, LDA
Related items