Font Size: a A A

Study On Multi-class Classification With Support Vector Machine Decision Tree For Unbalanced Data

Posted on:2018-12-04Degree:MasterType:Thesis
Country:ChinaCandidate:J LiFull Text:PDF
GTID:2359330518475023Subject:Statistics
Abstract/Summary:PDF Full Text Request
In this thesis,we consider the multi-class classification problem with unbalanced data.When we use classification algorithms to classify unbalanced data,these algorithms usually perform better for the majority class.Therefore,it is very necessary to propose some methods to handle unbalanced data to improve the performance of classification algorithms on both classes.For multi-class classification problem,many algorithms have been developed based on support vector machine schemes which is originally invented for binary classification problem.Among them,support vector machine decision tree algorithm is often used.We propose a new multi-class classification algorithm by improving the support vector machine decision tree algorithm.Our main contributions are as follows.(1)Most feature selection methods based on information entropy are carried on the whole sample space.Actually,feature selection is a dynamic process.Hence we propose a dynamic feature selection method based on information entropy to find the best feature set.(2)The oversampling method usually increases the number of repetitive samples,which turn to increase training time.And the under-sampling method tends to lose some useful information.Therefore,we develop a hybrid method based on Neighbor Clean Under-sampling and Synthetic Minority Oversampling Technique(SMOTE),which not only filters the boundary data points of the majority class according to some certain rules,but also deal with the minority class with SMOTE method.This is our first novelty in this thesis.(3)Support vector machine decision tree multi-classification method suffers error accumulation problem since the decision tree may cause error accumulation.In other words,if the decision tree misclassifies some nodes,then the following nodes will be misclassified with higher probability.In addition,the unbalanced data can further increase the error accumulation.Therefore,in order to reduce the error accumulation caused by Support vector machine decision tree algorithm,we propose to optimize decision tree on each step to guarantee high classification accuracy and apply the method mentioned in(2)to deal with unbalanced data.This is our second novelty in this thesis.(4)The improved support vector machine decision tree is used to simulate on five UCI standard data sets.Numerical analysis shows that the improved support vector machine decision tree multi-class classification algorithm can improve the classification accuracy for overall and minority class.Although the training time is increased a little,the results are still acceptable.In addition,we apply the new method proposed in this thesis to wine quality classification problem.The results show that our algorithm outperforms One Versus One SVM and Directed Acyclic Graph SVM.
Keywords/Search Tags:Multi-class Classification, Decision Tree, Support Vector Machine, Dynamic Information Entropy, Unbalanced Data
PDF Full Text Request
Related items