Font Size: a A A

The Research On Cancer Classification Algorithm With Jumping Emerging Pattern

Posted on:2012-09-28Degree:MasterType:Thesis
Country:ChinaCandidate:D LiFull Text:PDF
GTID:2234330395985591Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Classification is an important research problem in data mining and machine, aimed at building a classifier from training instances for predicting the classes of new instance. Recently, the development of microarray technology has supplied a large dimensionality of data to many fields; it has been mainly applied to prediction and diagnosis of cancer. Jumping Emerging Pattern is a special pattern which has strong discrimination. It has obvious advantage in classification. Therefore the proposition of a cancer classification with JEP is of great significance. With the rapid development of DNA microarray and genetic pattern, this paper proposes a cancer classification with JEP. To further improve the accuracy of cancer classification, this paper do a research on the combination classification based on the cancer classification by JSP, it has important research value and practical significance in biology and medicine.It starts at the analysis of gene expression data in this paper. The main work is summarized as follows:1. Preprocess the gene expression data. DNA microarray technology has supplied a large dimensionality and noise of data. This paper introduces an entropy-based discretization method to make continuous features to be discretized. The Minimal Description Length Principle is used to find the cut point. We compute the value of entropy and choise the discriminatory feature genes. The smaller the entropy value is, the stronger classification ability of feature gene is. The discretization method automatically removes many noisy featrures.2. Propose a stronger discriminatory pattern, called Improve Jumping Emerging Pattern. It has infinite growth rate.Any proper subset of this IJEP is not a IJEP. IJEP come from the featrure genes which get from the entropy-based discretization method. It introduces Bayesian m-estimate to compute the value of entropy in order to overcome the defect which frequency equals to probability in small volume samples. It can improve the reliability of the entropy.3. IJEPs are discovered by border-based algorithms. Different boundaries are discovered by BORDER-DIFF algorithm. IJEPs are discovered by MBD-LLBORDDER algorithm. It effectively reduces the mining time of IJEPs. As for the IJEPs, this paper proposes a cancer classification based on IJEP (CIJEP) and improves the computation of collective likelihood to make reliable prediction.4. This paper uses the CIJEPs as base classifiers,then applys ensemble machine learning to the cancer classification, last proposes Bag-CIJEP、Boost-CIJEP two algorithms. The experiment is taken on four datasets and the result show Bag-CIJEP、 Boost-CIJEP can improve the correction in cancer classification.
Keywords/Search Tags:DNA microarray, Cancer classification, Jumping Emerging Pattern, Collective likelihood, Ensemble machine learning
PDF Full Text Request
Related items