Font Size: a A A

Research On Training Data Selection For SVM Based On RF

Posted on:2018-12-29Degree:MasterType:Thesis
Country:ChinaCandidate:S S WeiFull Text:PDF
GTID:2348330542952395Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
Support vector machine is a machine learning method based on statistical theory,which is devoted to the study of finite sample classification and prediction.It has a solid theoretical basis,flexible algorithm and high classification accuracy.The technology has become a research hotspot in the field of machine learning and artificial intelligence in recent years,and has shown great practical utility in many areas such as text classification and handwriting recognition.Compared with other classification algorithms,support vector machine has the advantages of simple structure,global optimization,high generalization and so on.But as a new technology,support vector machine is still need to be explored and improved in the field of data mining.In this thesis we introduce the Random Packet Sampling Ensemble method in data preprocessing of support vector machine.Random Packet Sampling Ensemble is the improved RF.Firstly select the training data of base classifiers by random sampling method.Compared with the existing algorithm,the sampling method not only accelerates the speed of the data selection algorithm,but also ensures the randomness of the base classifier training samples.Then select the training samples for SVM based on ensemble margin.Training the support vector machine with the selected training data sets and achieving the classification results.Experiments show that compared with the traditional data selection algorithms,the new algorithm accelerates the speed of data selection,and reduces the time and space complexity of the training while maintaining the classification accuracy of SVM.In the paper,we applied random packet sampling ensemble algorithm to solve the classification problem of imbalanced data in order to improve the classification performance of base classifiers.Firstly divided the positive samples into several groups with random packet ensemble algorithm,and the number of positive samples and negative samples is the same.Combining the grouped samples with the negative samples to train base classifiers and determining the final classification result according to the ensemble rules.The experimental results based on decision trees show that the random packet sampling ensemble algorithm to obtain a more ideal classification results compared with other traditional imbalanced data processing methods.
Keywords/Search Tags:Support vector machine, Imbalanced data, Classification, Data selection
PDF Full Text Request
Related items