DNA Microarray Analysis Based On Machine Learning

Posted on:2009-10-04

Degree:Master

Type:Thesis

Country:China

Candidate:C Liao

Full Text:PDF

GTID:2120360242490615

Subject:Pattern Recognition and Intelligent Systems

Abstract/Summary:

PDF Full Text Request

In the domain of medicine, the diagnosis of the kind and stage of cancer is very important for choosing corresponding treatment. However, the traditional methods have many limits on this type of diagnosis. The development of tumor is usually with the complicated change of gene expression, which provides new tools for diagnosis and prediction of tumor.There is some obvious inaccuracy of gene expression in the experiment to obtain the gene expression data. Meanwhile, as the expensive cost of the experiment, there are usually only several samples, but hundreds of thousands of genes which contain many irrelevant genes. This is a typical high dimension and noise problem. What's more, there are plenty of redundant genes because the gene expressions with the similar functions are highly relevant. So it is meaningful to select the discriminant genes to improve tumor diagnosis accuracy.This thesis is to take emphases on microarray data analysis for gene selection, feature extraction and the classifier. The main works in this thesis can be introduced as follows:First, some new gene selection or feature extraction algorithms are proposed to choose the informative genes, which form a new and lower dimensional subset for classification:1. Feature extraction algorithms based on Discrete Wavelet transform (DWT): Gene expression data is preprocessed by T-test. Then it is decomposed by DWT to get approximation and high-frequency coefficients. Maximum modulus method is used to select some high-frequency coefficients, and then with all of the approximation coefficients together, they form a new subset.2. Feature extraction algorithm using kernel methods: Microarray data is firstly preprocessed by T-test, and then processed by kernel methods to get the subset with lower dimension.3. Gene selection algorithm using Support Vector Machine (SVM): Gene expression data is preprocessed by wilcoxon rank sum test. Then each gene is trained a SVM classifier, and the classifier is tested by the gene itself respectively to get the corresponding accuracy. The genes with highest accuracy are picked out to form the new subset.Secondly, SVM ensemble classifier is used to increase the accuracy: Wilcoxon rank sum test is used to preprocess microarray data. Step by step, three classifiers are obtained using the samples chosen by confidence. Then these classifiers form a SVM ensemble classifier.Experiments proved that the classification accuracy using proposed methods could reach the state-of-the-art level.

Keywords/Search Tags:

machine learning, gene selection, classifier, DWT, kernel methods, SVM

PDF Full Text Request

Related items

1	Researches On Gene Selection Algorithm With Support Vector Machine
2	Multiple Kernel Learning For Predicting Protein Secondary Structures
3	Research On Intelligent Selection Of Road Network Automatic Generalization Based On Kernel-based Machine Learning
4	Gene Data Classification Research Based On The Improved Particle Swarm Optimization And Extreme Learning Machine
5	Characterization And Machine Learning Prediction Of Allele-Specific DNA Methylation
6	Improving Gene Structure Prediction By Combining Multiple Sources Of Evidence
7	Research On Several Learning Problems Based On Kernel Alignment
8	An Implementation Method For Minimal VC Dimensional Classifier
9	The Algorithm Research For Genomic Selection Study Based On Machine Learning
10	Research On Clustering Methods Of Single-cell RNA Sequence Based On Machine Learning