Research On Cost-sensitive Learning Method Based On Probability Density

Posted on:2024-07-18

Degree:Master

Type:Thesis

Country:China

Candidate:S L Zhou

Full Text:PDF

GTID:2568307154498644

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Cost-sensitive learning is a popular way to solve the problem of class-unbalanced learning(CIL).Traditional cost-sensitive learning methods always solve CIL problems by assigning a constant training error penalty higher than the majority instances for all minority instances,ignoring the importance of the location information of each instance.Therefore,some studies have begun to focus on personalized cost allocation,that is,assigning different costs according to the location information of different instances.New personalized costsensitive methods always perform better than traditional methods;However,its estimate of location information may not be accurate because it is susceptible to data density.In addition,some algorithms even directly use the method of data cleaning to solve the noise problem in terms of removing noise or eliminating the propagation of noise.Although it can achieve good results on some data sets,this method may provide highly inaccurate guidance for the complex and changeable data distribution in daily life.As a result,it is difficult to find out the real noise and the learning model will have low quality.To solve these problems,a more robust and general solution is proposed.The main research results of this thesis are as follows:1.The RUE algorithm is proposed.Similar to the Pre-AdaCost algorithm,it also adopts the random undersampling ensemble.The error rate feedback method is used to explore the distribution of instance position information indirectly,and the noise,safety and boundary are divided according to the size relationship between the feedback information and noise threshold.In this way,the noise can be effectively removed and the boundary region can be strengthened,making the learning model pay more attention to fitting these instances.2.The Pre-AdaCost algorithm is proposed,which can be regarded as a new costsensitive AdaBoost algorithm.Pre-AdaCost adds location information preestimation and weight preassignment processes before running AdaCost.Instance location information is used to find and remove noise,then to guide weight preallocation.And a robust indirect strategy based on error rate feedback of random undersampled sets is adopted.In particular,noise always has a high error rate,and a borderline instance generally corresponds to a medium error rate,while a safe instance often has a low error rate.Based on this assumption,the noise can be found and removed.The difference between RUE algorithm and Pre-AdaCost algorithm is that RUE takes the ratio between the error rate of an instance and the total error rate of the class as the cost of each instance,while Pre-AdaCost calculates the cost of each instance by adjusting factors to amplify the difference between the initial weights of hard-to-learn instances and easy-tolearn instances,and then normalizes the cost.The most common point between them is that they will minimize the noise of instance space and prevent the phenomenon of overfitting of learning model.In order to prove the effectiveness of the above algorithms,we compared them with the most popular cost-sensitive learning algorithms,including FSVM and WELM frameworks,and conducted experiments on more than 40 different data sets.Whether it is direct exploration or indirect relative density distribution,RUE algorithm and Pre-AdaCost algorithm show their universality and robustness.

Keywords/Search Tags:

Class imbalance, AdaBoost, Random undersampling ensemble, Cost-sensitive learning, Fuzzy support vector machine

PDF Full Text Request

Related items

1	Research Of Ensemble Classification Methods For Class-imbalance And Cost-sensitive Datasets
2	Research On Ensemble Method Of Structured Support Vector Machine For Imbalanced Data
3	Research On Fuzzy Support Vector Machine Algorithm For Class Imbalance Learning
4	Research On Automatic Diagnosis Methods Of Breast Cancer Based On Cost-Sensitive Learning And Its Application
5	Rotation Based And Improved AdaBoost Based Ensembles Of One-class Support Vector Machines
6	Improvement And Application Of Ensemble Learning Method Based On Support Vector Machin
7	Classification Algorithms For Class Imbalance Data
8	Study Of Class Imbalance Learning Based On Extreme Learning Machine
9	Research On Fast Algorithms For Cost Sensitive Support Vector Machine
10	Hybrid Ensemble Learning For Imbalanced Data