Font Size: a A A

Research On The Method Of Controlling The Diversity In Ensemble Learning

Posted on:2013-01-26Degree:MasterType:Thesis
Country:ChinaCandidate:P LiFull Text:PDF
GTID:2248330371470855Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Ensemble learning is one of the top research directions in the field of machine learning. It performs learning on dataset using some base learners under some ensemble strategies, so to get higher accuracy than a single base learner. There is certain relationship between the learners’ diversity and the learning accuracy, this relationship is the key to improve the performance of ensemble learning algorithm. Nowadays, the main focus of this research is mainly about how to design the appropriate diversity measure, and using the result of finding to improve the ensemble algorithms.A series of large scale experiments are conducted covering various angles from the data set size, style of base classifiers, ensemble strategies, diversity measures, to accuracy measures, and then a comprehensive comparative analysis of the results is conducted. Under the support of experimental results, using random decision trees (Random Decision Trees, RDT) algorithm as the basis, a novel ensemble algorithm, controlled randomness ensemble method, is proposed. This method introduces more randomness into the base classifier to increase the base learner diversity, so to improve the accuracy of the algorithm. In the end, the experiment on large scale real dataset is conducted to verify the effectiveness of proposed method. To be more specific, the main works are as follow:(1) The experiment is conducted using three stable base learner:naive Bayes, support vector machine and k nearest neighbor algorithm and three unstable learner: neural network, decision tree and random decision trees. The experiment datasets are 10 UCI data sets of different scales. Also, two ensemble strategies, three diversity measures and two accuracy measures are used to evaluate the final results, in order to reveal the relationship between the diversity and the accuracy. The results show that DF diversity measure gives the most distinct depict of this relationship, which is the diversity will promote the accuracy. (2) Furthermore, a follow-up question, will the accuracy increase with the diversity, is proposed. In order to answer this question, random projection is introduced to get more randomness. An experiment on RDT based bagging ensemble with random projection is conducted, the result shows that this strategy indeed gets more diversity, but the accuracy decreases, which means more diversity doesn’t always guarantee more accuracy. Based on this result, a randomness control factor cp is introduced into the RDT algorithm. The experimental result shows that cp-RDT outperforms original RDT in several datasets, which supports the result.(3) Finally, the cp-RDT algorithm is used on clinical analysis of TCM diagnosis and treatment of coronary heart disease in the mining, the establishment of a model for diagnosis of coronary heart disease with TCM syndrome, to a certain extent, the result further demonstrated the validity of the algorithm.
Keywords/Search Tags:Machine Learning, Ensemble Learning, Diversity, Randomness, TCM Diagnosis and Treatment of Coronary Heart Disease
PDF Full Text Request
Related items