Applying Of Support Vector Machines In Microarray Gene Expression Data Classification

Posted on:2005-01-03

Degree:Master

Type:Thesis

Country:China

Candidate:C Wu

Full Text:PDF

GTID:2144360125468465

Subject:Epidemiology and Health Statistics

Abstract/Summary:

PDF Full Text Request

The wildly used Gene Chip (Microarray) technology in the field of gene study has brought explosive increasing of microarray experiment data, which also called gene expression data. Gene expression data is always large-scaled and has unbalanced observations and samples in the data matrix. There are also many missing values in the dataset from different sources. Most traditional statistical methods can't treat such dataset at all. Researchers must seek for new methods. Early days, people always analysis gene expression data with clustering algorithm and get some believable results. With the knowing of gene classes, researchers need more effective algorithms to make use of this information to predict unknown genes' functions. So supervised algorithms that base on acute results of biology experiments become the new focus of gene expression data analysis. Among them, support vector machines (SVMs) which bases statistical learning theory is one of the youngest supervised algorithms. It have many features that make them attractive for gene expression analysis, including their flexibility in choosing a similarity function, sparseness of solution when treating large data sets, the ability to handle large feature spaces, and the ability to identify outliers. But as a new technology of machine learning, few chip researchers know SVMs. And there are few paper on how to analysis gene expression data using SVMs. When chip experimenter and biologists got their cherish first-hand material", they would perhaps miss some important information because of the deficiencies in analyzing algorithms.This paper describes the algorithm of SVMs based on the status quo of gene expression data analysis and gives the specific SVMs algorithm and training process especially for gene expression data, hi the paper, we mainly described how to establish complete SVMs algorithm for gene expression data basing on a well-known gene expression database, MYGD, provided by MIPS. Otherwise, we improved the SVMs algorithm in two aspects: training speed and predicting acuity, applied them in gene expression data analysis and got delighted results. All of the experiments results and discussing of relative problems are in the end of the paper. There are also some problems that we don't resolve and the work that we will do next in the last part. Since gene expression data can be analyzed in general statistical analysis process despite of its own features. The paper started with data cleaning, including missing value filling and data normalization. In this party, we compared three filling methods of gene expression missing data and some normalization methods. Based on the cleaned dataset, we introduced different kernel functions of SVMs and several feasible SVMs software processes for gene expression data classification. Otherwise, we introduced two improved SVMs algorithms, SISSVM and SVM-KNN, to treat gene expression data.Though the experiments, we got following results: at first, KNN and filling with means of class method are better than other filling methods. These two haven't statistical significance. Researchers should select either of them based on the aim of study. Second, comparing with other kernel functions, RBF SVM and higher degree polynomial function SVM are better in recognizing genes of the same functional class using gene expression data. Third, our SVMs process is very easy to use and we gave some programs to help researcher carry it. We compared it with some generally used SVMs algorithm based on the same dataset. And the results showed that the two algorithms have the same predicting acuity and training speed in training. Forth, SVM-KNN algorithm can increase the acuity of model at some degree and SISVM can raise the speed of training without losing the predicting acuity in gene expression data analysis.All in all, as a new tool treating microarray data, SVMs bases on good theory and has a wonderful perspective. SVMs itself and its improved algorithm will show advantages in much wider gene researching field.

Keywords/Search Tags:

Gene Expression Data, Pattern Classification, KNN method, Support Vector Machines, SISVM algorithm

PDF Full Text Request

Related items

1	Research On Tumor Classification Algorithm Based On Gene Expression Data
2	Gene Expression Profile Data Classification Based On Support Vector Machine And Genetic Algorithm
3	Research On SVM Heart Sound Signal Classification Algorithm Based On Optimization
4	Research On Classification Algorithm In Heart Disease Predicting And Diagnosing
5	Classification Of AD Progression Using SVM Model On Structure MRI Data
6	Study Of Tumor Characteristic Gene Extraction Method
7	Gene Expression Pattern Recognition And Classification Of Alzheimer’s Disease Based On Deep Learning
8	Research On Intelligent Optimization Algorithm And Its Application In Syndrome Classification Of Type 2 Diabete
9	Study On The Prediction Of Intracranial Hypertension Based On Waveform Feature Extraction And Support Vector Machines Classification
10	Application Of Support Vector Machine In TCM Syndrome Classification