| ABSTRACT: Cancer is one of the most danger diseases which threats to human health in the world. Cancer early detection is very important significance to patient's diagnosis and treatment. The developments of gene chip technology promote the cancer research in molecular level. Mining useful knowledge from massive cancer gene expression file data, we can get a more comprehensive understanding cancer gene essence, to understand the relationship between cancer and gene, which can promote the cancer clinical diagnosis and treatment, for the further cancer research, found the cancer developed pathogenesis, develop new drugs have a vital role.Pointing to the gene expression profile data small sample, high dimension, nonlinear characteristics, this article established a model which take the AdaBoost algorithm as the foundation, cascade SVM sorting algorithm and the single gene weak sorting algorithm respectively, based on the machine learning co-training thought. These two kinds of methods have a mutual advantage of suitable for the linear not separable situation. As we all know, classification mistakes is due to the samples are classified into the wrong class. Actually, AdaBoost-SVM method is aim to reduce the number of these wrong classified samples. By changing the weights of the training examples in the re-sampling process of AdaBoost, there are less wrong classified examples in AdaBoost-SVM.After taking a real colon cancer gene expression data for experiments, we select 20 genes from 2000 genes and identify them as the classification feature genes. Through cross experiment, the results show that the AdaBoost-SVM algorithm can achieve good classification. Finally, we improve the AdaBoost-SVM model that can add priori knowledge, and this can improve the reliability of classification. |