Font Size: a A A

DNA Microarray Analysis Based On Machine Learning

Posted on:2009-10-04Degree:MasterType:Thesis
Country:ChinaCandidate:C LiaoFull Text:PDF
GTID:2120360242490615Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
In the domain of medicine, the diagnosis of the kind and stage of cancer is very important for choosing corresponding treatment. However, the traditional methods have many limits on this type of diagnosis. The development of tumor is usually with the complicated change of gene expression, which provides new tools for diagnosis and prediction of tumor.There is some obvious inaccuracy of gene expression in the experiment to obtain the gene expression data. Meanwhile, as the expensive cost of the experiment, there are usually only several samples, but hundreds of thousands of genes which contain many irrelevant genes. This is a typical high dimension and noise problem. What's more, there are plenty of redundant genes because the gene expressions with the similar functions are highly relevant. So it is meaningful to select the discriminant genes to improve tumor diagnosis accuracy.This thesis is to take emphases on microarray data analysis for gene selection, feature extraction and the classifier. The main works in this thesis can be introduced as follows:First, some new gene selection or feature extraction algorithms are proposed to choose the informative genes, which form a new and lower dimensional subset for classification:1. Feature extraction algorithms based on Discrete Wavelet transform (DWT): Gene expression data is preprocessed by T-test. Then it is decomposed by DWT to get approximation and high-frequency coefficients. Maximum modulus method is used to select some high-frequency coefficients, and then with all of the approximation coefficients together, they form a new subset.2. Feature extraction algorithm using kernel methods: Microarray data is firstly preprocessed by T-test, and then processed by kernel methods to get the subset with lower dimension.3. Gene selection algorithm using Support Vector Machine (SVM): Gene expression data is preprocessed by wilcoxon rank sum test. Then each gene is trained a SVM classifier, and the classifier is tested by the gene itself respectively to get the corresponding accuracy. The genes with highest accuracy are picked out to form the new subset.Secondly, SVM ensemble classifier is used to increase the accuracy: Wilcoxon rank sum test is used to preprocess microarray data. Step by step, three classifiers are obtained using the samples chosen by confidence. Then these classifiers form a SVM ensemble classifier.Experiments proved that the classification accuracy using proposed methods could reach the state-of-the-art level.
Keywords/Search Tags:machine learning, gene selection, classifier, DWT, kernel methods, SVM
PDF Full Text Request
Related items