Font Size: a A A

Base On Cancer Multi-classification Of Ensemble Neural Network

Posted on:2014-01-04Degree:MasterType:Thesis
Country:ChinaCandidate:X J LeiFull Text:PDF
GTID:2254330425481038Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
At present, the number of cancer patients has a linear growth, the prevention andtreatment of cancer is still a difficult problem in today’s world. There are many types of cancer,it is more difficulty for a cure of the cancer, and the first step of cancer therapy is to classifythe cancer cells. Classification accuracy will directly affect the patient’s diagnosis. Along withthe United States firstly put forward the human genome project drawing and successfullyfinish the project, DNA microarray technology arises at the historic moment, has entered anew era in cancer research. DNA microarray technology can automatically, rapidly andefficiently detect tens of thousands of genes expression, through the analysis of the geneexpression data at the molecular level, deeply understand the cell physiology, such as whichgenes cause cancer, when cancer cell proliferation and metastasis, and so on. It plays a veryimportant role for the technology of cancer diagnosis and explaining the mechanism of canceras well as drug development.The advent of microarray technology provides convenience for cancer research at thesame time, also brought about large amounts of biological data processing new challenges.Due to high cost limits the experiment times, the DNA microarray data has small samples,high dimension, high noise, high redundancy and uneven distribution of properties. For suchdata, if directly used for training classifier, there will be a fitting phenomenon, even it isinvalid of many traditional classifier. In order to solve this problem, usually extract featuregene subset from original data to achieve the purpose of dimension reduction.At the same time, it is difficult to solve the multiple classification problems, but theyoften exist in real life. Compared with the binary classification, multi-classification model notonly is more complex, and obtain the worse classification effect. Due to the classificationproblem of microarray data, which is a kind of typical "high dimensional genes low samples"data, if you want to get a better classification accuracy that will be more difficult.In order to solve the cancer multi-classification problem, this paper mainly studied fromtwo aspects: one is feature selection; the other is converting the multi-classification probleminto multiple binary classification problems. By combining the advantage of the Filter method which is easy to perform with running quickly, and the Wrapper method that can effectivelyselect feature genes, we put forward a novel feature gene selection method based on the Flitermethod and the Wrapper method, it is based on BW ratio and flexible neural tree mixedfeature gene selection method. The idea of the method is: Firstly, the BW ratio method wasadopted to select some informative genes; Then according to some kind of coding strategywill multi-category characteristics of selected gene subsets divided into many two-categoryof feature subsets; Finally and then the FNT method was used to extract more characteristicgenes from the selected gene subsets. This article adopts the encoding strategy of "one to one"to convert multi-classification problem into multiple binary classification problem to solve.Due to the flexible neural tree is a special kind of neural network, FNT is a tree-structuredneural network with input variables selection, over-layer connections and different activationfunctions for different nodes. Based on the pre-defined instruction/operator sets, a flexibleneural tree model can be created and evolved. The FNT structure is developed by usingprobabilistic incremental program evolution (PIPE) algorithm, and the free parametersembedded in neural trees are optimized by particle swarm optimization (PSO) algorithm.Flexible neural tree has the function of classification and selecting features, we are usingartificial neural networks as a classifier to verify that we have proposed the mixed featuregene selection method which is more the effective for classification.In experiments, adopted the internationally used microarray data sets of the MLL (threecategories) and Brian (four categories) to classify, the results showed that the presentedmethod was better than other methods had fewer number of feature genes and higherclassification accuracy.
Keywords/Search Tags:DNA microarray, Gene selection, Flexible Neural Tree, Particle SwarmOptimization, Artificial Neural Network, multi-classification
PDF Full Text Request
Related items