Study On Multi-class Classification With Support Vector Machine Decision Tree For Unbalanced Data

Posted on:2018-12-04

Degree:Master

Type:Thesis

Country:China

Candidate:J Li

Full Text:PDF

GTID:2359330518475023

Subject:Statistics

Abstract/Summary:

PDF Full Text Request

In this thesis,we consider the multi-class classification problem with unbalanced data.When we use classification algorithms to classify unbalanced data,these algorithms usually perform better for the majority class.Therefore,it is very necessary to propose some methods to handle unbalanced data to improve the performance of classification algorithms on both classes.For multi-class classification problem,many algorithms have been developed based on support vector machine schemes which is originally invented for binary classification problem.Among them,support vector machine decision tree algorithm is often used.We propose a new multi-class classification algorithm by improving the support vector machine decision tree algorithm.Our main contributions are as follows.(1)Most feature selection methods based on information entropy are carried on the whole sample space.Actually,feature selection is a dynamic process.Hence we propose a dynamic feature selection method based on information entropy to find the best feature set.(2)The oversampling method usually increases the number of repetitive samples,which turn to increase training time.And the under-sampling method tends to lose some useful information.Therefore,we develop a hybrid method based on Neighbor Clean Under-sampling and Synthetic Minority Oversampling Technique(SMOTE),which not only filters the boundary data points of the majority class according to some certain rules,but also deal with the minority class with SMOTE method.This is our first novelty in this thesis.(3)Support vector machine decision tree multi-classification method suffers error accumulation problem since the decision tree may cause error accumulation.In other words,if the decision tree misclassifies some nodes,then the following nodes will be misclassified with higher probability.In addition,the unbalanced data can further increase the error accumulation.Therefore,in order to reduce the error accumulation caused by Support vector machine decision tree algorithm,we propose to optimize decision tree on each step to guarantee high classification accuracy and apply the method mentioned in(2)to deal with unbalanced data.This is our second novelty in this thesis.(4)The improved support vector machine decision tree is used to simulate on five UCI standard data sets.Numerical analysis shows that the improved support vector machine decision tree multi-class classification algorithm can improve the classification accuracy for overall and minority class.Although the training time is increased a little,the results are still acceptable.In addition,we apply the new method proposed in this thesis to wine quality classification problem.The results show that our algorithm outperforms One Versus One SVM and Directed Acyclic Graph SVM.

Keywords/Search Tags:

Multi-class Classification, Decision Tree, Support Vector Machine, Dynamic Information Entropy, Unbalanced Data

PDF Full Text Request

Related items

1	Research On Support Vector Machine-Decision Tree Arithmetic And It's Application
2	The Evaluation Model And Empirical Research Of Customers' Loan Approval Based On Decision Tree And Support Vector Machine Algorithm
3	Research On Quality Stock Selected Based On Support Vector Machine
4	Research And Application Of Support Vector Machine On Imbalanced Data Classification
5	Personal Credit Scoring Based On Classification Tree And Support Vector Machines
6	Research On Classification Predication Of Different Stock Returns Based On Support Vector Machine
7	Study On Support Vector Machines Classification Methods And Their Application In Text Categorization
8	Study On The Application Of Support Vector Machine With Entropy Method In Coal Mine Safety Evaluation
9	Research On Support Vector Machine Models And Algorithms Based On Structural Information
10	Study On The Information Analysis And Decision Making Method Of Imports And Exports In Foreign Trade Of Chongqing Based On The Support Vector Machine