Research On Training Data Selection For SVM Based On RF

Posted on:2018-12-29

Degree:Master

Type:Thesis

Country:China

Candidate:S S Wei

Full Text:PDF

GTID:2348330542952395

Subject:Applied Mathematics

Abstract/Summary:

PDF Full Text Request

Support vector machine is a machine learning method based on statistical theory,which is devoted to the study of finite sample classification and prediction.It has a solid theoretical basis,flexible algorithm and high classification accuracy.The technology has become a research hotspot in the field of machine learning and artificial intelligence in recent years,and has shown great practical utility in many areas such as text classification and handwriting recognition.Compared with other classification algorithms,support vector machine has the advantages of simple structure,global optimization,high generalization and so on.But as a new technology,support vector machine is still need to be explored and improved in the field of data mining.In this thesis we introduce the Random Packet Sampling Ensemble method in data preprocessing of support vector machine.Random Packet Sampling Ensemble is the improved RF.Firstly select the training data of base classifiers by random sampling method.Compared with the existing algorithm,the sampling method not only accelerates the speed of the data selection algorithm,but also ensures the randomness of the base classifier training samples.Then select the training samples for SVM based on ensemble margin.Training the support vector machine with the selected training data sets and achieving the classification results.Experiments show that compared with the traditional data selection algorithms,the new algorithm accelerates the speed of data selection,and reduces the time and space complexity of the training while maintaining the classification accuracy of SVM.In the paper,we applied random packet sampling ensemble algorithm to solve the classification problem of imbalanced data in order to improve the classification performance of base classifiers.Firstly divided the positive samples into several groups with random packet ensemble algorithm,and the number of positive samples and negative samples is the same.Combining the grouped samples with the negative samples to train base classifiers and determining the final classification result according to the ensemble rules.The experimental results based on decision trees show that the random packet sampling ensemble algorithm to obtain a more ideal classification results compared with other traditional imbalanced data processing methods.

Keywords/Search Tags:

Support vector machine, Imbalanced data, Classification, Data selection

PDF Full Text Request

Related items

1	Support Vector Machine Based Classification Algorithms Research For Imbalanced Data
2	Feature Selection And Classification For Imbalanced Medical Data
3	The Research Of Imbalanced Data Classification Algorithm Based On Support Vector Machine
4	Research On Support Vector Machine Classification Method For Imbalanced Datasets
5	Support Vector Machine Based Classification Models And Algorithms Research For Imbalanced Data
6	Research And Application Of Imbalance Data Classification Based On Support Vector Machine
7	Research On Classification Algorithms For Imbalanced Dataset
8	Research On Support Vector Machine Models And Algorithms For Imbalanced Data
9	Research On Classification Algorithm Of Data Mining Based On Improved Support Vector Machine
10	Researches On Optimization Modeling Methods Of Support Vector Machine