Font Size: a A A

Research On Classification Algorithms For Tumor Gene Expression Data Based On ELM

Posted on:2014-10-28Degree:MasterType:Thesis
Country:ChinaCandidate:F M JinFull Text:PDF
GTID:2284330482456203Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
In recent years, with the increasing number of the tumor patients, the prevention and treatment of tumor has become the world-wide focus. Accordig to the statistics, the number of deaths due to malignant tumor throughout the whole world has ranked the top one and increased over that due to heart disease and cerebrovascular disease. Most of the diagnosis methods for tumor are based on the the morphology, which may result in the colonical difference for the same type of tumor. Thus, these disgnosis methods usually suffer from a lot of limitations in the treatment sensivity. With the rapid development of gene chip technology, more and more tumor gene expression data could be determined. From the viewpoint of molecular biology, the utilization of the gene chip technology can anylyse and process the gene expression data effectively. Thus, the early diagnosis and individual treatment are significant for the tumor patients to increase their survival rate. However, the gene expression data usually has some important features such as high dimension, imbalanced distribution and small number of samples. How to refine the limited disease gene from the high-demension data, i.e., the classification of gene expression data, has been drawing more and more extensive concern of researchers.This thesis focuses on sdutying the classification of gene expression data. The neutral networks and machine learning are utilized to design the classification model and propose the classification algorithms. First, to cope with the unstable performance of single Extreme Learning Machine (ELM), an Ensemble Algorithm with ELM Diversity (EAED) is adopted for the ensemble of classifiers based on the different measurements in the outputs. The adopted algorim first makes the diversity judgements of the ELM models according to the different measurements in the outputs, and then removes the model according to the classification accuracy. Thereafter, the voting method is used for the ensemble of the selected classification model. Then, this thesis analyses the effect of the rejective recognition cost and the misclassification cost on the classification performance. A Cost-Sensitive ELM (CS-ELM) algorhtm is designed to reduce the decision risk and the average classification cost. By introducing the cost sensitivity, the designed algorithm could significantly improve the effectiveness in coping with the gene expression data with different costs.Based on the various tumor and non-tumor data sets, the adopted algorithms are analysed theoritically and demonstrated experimentally. The results show that, the EAED algorithm can achieve the stable claciffication accuracy by using fewer ELM models; and the CS-ELM algorithm can reduce the average cost of misclaciffication and increase the classification reliability. Therefore, the works in this thesis is benefit in raising the classification accuracy of the tumor gene expression data and coping with the challenging issue of gene classification. These works have much thoritcal and practical significance in promoting the research on high-dimension and imbalanced gene expression data.
Keywords/Search Tags:Gene expression data, Gene chip technology, Extreme learning machine, Cost sensitive
PDF Full Text Request
Related items