Cost-sensitive learning is a popular way to solve the problem of class-unbalanced learning(CIL).Traditional cost-sensitive learning methods always solve CIL problems by assigning a constant training error penalty higher than the majority instances for all minority instances,ignoring the importance of the location information of each instance.Therefore,some studies have begun to focus on personalized cost allocation,that is,assigning different costs according to the location information of different instances.New personalized costsensitive methods always perform better than traditional methods;However,its estimate of location information may not be accurate because it is susceptible to data density.In addition,some algorithms even directly use the method of data cleaning to solve the noise problem in terms of removing noise or eliminating the propagation of noise.Although it can achieve good results on some data sets,this method may provide highly inaccurate guidance for the complex and changeable data distribution in daily life.As a result,it is difficult to find out the real noise and the learning model will have low quality.To solve these problems,a more robust and general solution is proposed.The main research results of this thesis are as follows:1.The RUE algorithm is proposed.Similar to the Pre-AdaCost algorithm,it also adopts the random undersampling ensemble.The error rate feedback method is used to explore the distribution of instance position information indirectly,and the noise,safety and boundary are divided according to the size relationship between the feedback information and noise threshold.In this way,the noise can be effectively removed and the boundary region can be strengthened,making the learning model pay more attention to fitting these instances.2.The Pre-AdaCost algorithm is proposed,which can be regarded as a new costsensitive AdaBoost algorithm.Pre-AdaCost adds location information preestimation and weight preassignment processes before running AdaCost.Instance location information is used to find and remove noise,then to guide weight preallocation.And a robust indirect strategy based on error rate feedback of random undersampled sets is adopted.In particular,noise always has a high error rate,and a borderline instance generally corresponds to a medium error rate,while a safe instance often has a low error rate.Based on this assumption,the noise can be found and removed.The difference between RUE algorithm and Pre-AdaCost algorithm is that RUE takes the ratio between the error rate of an instance and the total error rate of the class as the cost of each instance,while Pre-AdaCost calculates the cost of each instance by adjusting factors to amplify the difference between the initial weights of hard-to-learn instances and easy-tolearn instances,and then normalizes the cost.The most common point between them is that they will minimize the noise of instance space and prevent the phenomenon of overfitting of learning model.In order to prove the effectiveness of the above algorithms,we compared them with the most popular cost-sensitive learning algorithms,including FSVM and WELM frameworks,and conducted experiments on more than 40 different data sets.Whether it is direct exploration or indirect relative density distribution,RUE algorithm and Pre-AdaCost algorithm show their universality and robustness. |