Research On The Method Of Controlling The Diversity In Ensemble Learning

Posted on:2013-01-26

Degree:Master

Type:Thesis

Country:China

Candidate:P Li

Full Text:PDF

GTID:2248330371470855

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Ensemble learning is one of the top research directions in the field of machine learning. It performs learning on dataset using some base learners under some ensemble strategies, so to get higher accuracy than a single base learner. There is certain relationship between the learnersâ€™ diversity and the learning accuracy, this relationship is the key to improve the performance of ensemble learning algorithm. Nowadays, the main focus of this research is mainly about how to design the appropriate diversity measure, and using the result of finding to improve the ensemble algorithms.A series of large scale experiments are conducted covering various angles from the data set size, style of base classifiers, ensemble strategies, diversity measures, to accuracy measures, and then a comprehensive comparative analysis of the results is conducted. Under the support of experimental results, using random decision trees (Random Decision Trees, RDT) algorithm as the basis, a novel ensemble algorithm, controlled randomness ensemble method, is proposed. This method introduces more randomness into the base classifier to increase the base learner diversity, so to improve the accuracy of the algorithm. In the end, the experiment on large scale real dataset is conducted to verify the effectiveness of proposed method. To be more specific, the main works are as follow:(1) The experiment is conducted using three stable base learner:naive Bayes, support vector machine and k nearest neighbor algorithm and three unstable learner: neural network, decision tree and random decision trees. The experiment datasets are 10 UCI data sets of different scales. Also, two ensemble strategies, three diversity measures and two accuracy measures are used to evaluate the final results, in order to reveal the relationship between the diversity and the accuracy. The results show that DF diversity measure gives the most distinct depict of this relationship, which is the diversity will promote the accuracy. (2) Furthermore, a follow-up question, will the accuracy increase with the diversity, is proposed. In order to answer this question, random projection is introduced to get more randomness. An experiment on RDT based bagging ensemble with random projection is conducted, the result shows that this strategy indeed gets more diversity, but the accuracy decreases, which means more diversity doesnâ€™t always guarantee more accuracy. Based on this result, a randomness control factor cp is introduced into the RDT algorithm. The experimental result shows that cp-RDT outperforms original RDT in several datasets, which supports the result.(3) Finally, the cp-RDT algorithm is used on clinical analysis of TCM diagnosis and treatment of coronary heart disease in the mining, the establishment of a model for diagnosis of coronary heart disease with TCM syndrome, to a certain extent, the result further demonstrated the validity of the algorithm.

Keywords/Search Tags:

Machine Learning, Ensemble Learning, Diversity, Randomness, TCM Diagnosis and Treatment of Coronary Heart Disease

PDF Full Text Request

Related items

1	Research On Na(?)ve Bayesian Classification And Its Application
2	Application Of Decision Tree Model In Whole Genome Association Of Coronary Heart Disease
3	Data Warehousing And Data Mining On The Clinical Database Of Coronary Treatment With Chinese Traditional Medicine
4	Research On Key Technologies For Combined Monitoring Of ECG And Blood Pressure
5	Research On The Application Of Association Analysis In Clinical Data Of Coronary Heart Disease
6	Research On Structural Diversity Of Ensemble Learning
7	Research On Recognition Of OCT Cardivascular Vulnerble Plaque Based On Deep Learning
8	The Feature Extraction Of ST Segments Based On Wavelets Transform And Its Application In Coronary Heart Disease Diagnosis
9	Selective Ensemble Learning Algorithm Based On Pairwise Diversity Measures
10	Research Of Extreme Learning Machine Based On Ensemble Method