Research On Direct Optimization Of AUC Algorithm For Large-scale Data

Posted on:2020-11-10

Degree:Master

Type:Thesis

Country:China

Candidate:X Zhang

Full Text:PDF

GTID:2428330575954496

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Recently,machine learning and data mining have become a hot research topic in the field of artificial intelligence.As an important research field of machine learning and data mining,binary classification has received more and more attentions.most traditional classification algorithm measure the performance by accuracy.However,these metrics are not appropriate for applications where data are class-imbalanced.Direct optimization of the imbalanced measure has caught the attentions of many researchers because of the objective function is consistent with its evaluation criteria,in particular,much attention has also been paid on designing the classifiers to directly optimize AUC.In the real application,such as information retrieval and financial fraud detection,however,most of AUC maximization algorithms are proposed for batch learning,which means when the data set is of large size,these algorithms suffer from high computational costs,which makes them unsuitable for the large-scale applications.A natural way to tackle the problem is to develop stochastic(online)AUC maximization methods,since in the machine learning community,stochastic(online)learning has been proven to be very efficient to deal with large-scale data sets.this thesis proposed an adaptive mini-batch stochastic gradient method that directly optimized AUC for large-scale data sets,and an adaptive robust method for online AUC maximization is proposed for large-scale environment with noise data.The main contributions of the thesis can be summarized as follows:(1)An adaptive mini-batch stochastic gradient descent method for AUC maximization,termed AMAUC,is proposed.Specifically,the algorithm adopts the framework of mini-batch and L2 regularization to form the objective function,The mini-batch method can greatly reduce the complexity of AUC,and the L2 regularization can prevent the model from overfitting,and uses projection gradient method for the inner optimization.In order to further improve the performance,an adaptive learning rate updating strategy is also suggested,where the second-order information of historical gradient is utilized to provide the feature-wise updating.the frequently occurring features are assigned with low learning rates while the rarely occurring features are given high learning rates.Furthermore,the convergence of AUC optimization is further improved to O(log(T)/T).Experiments on large-scale benchmarks and high-dimensional datasets demonstrate the efficiency and effectiveness of the the proposed AMAUC.(2)The training data of the large-scale application has noise.Many effective online AUC maximization algorithms have been proposed,in spite of the promising performance of those online algorithms,however,most of them are sensitive to the outliers,which makes them unsuitable for the applications with noisy data.This thesis presents an adaptive robustness maximization method for online AUC,called AROAM.Specifically,a ramp loss based objective function oriented to AUC metric is firstly defined in AROAM,which has the strong ability of suppressing the influence of outliers.Then concave-convex procedure is adopted for the convex approximation of the objective function.Further,momentum method and adaptive strategy is adopted in each iteration to improve the performance of AROAM,which can update the classifier effectively.Experiments on benchmark data demonstrate the effectiveness of the proposed method,Experiments on data sets with different noise rates also verify the robustness of the algorithm.

Keywords/Search Tags:

Imbalanced binary classification, AUC, Stochastic learning, Online learning, Robust learning

PDF Full Text Request

Related items

1	Research On Direct Optimization Of AUC Algorithm Based On Online Learning
2	Designing Feature Selection And Classincation Methods For Classificationmethods For Imbalanced Learning And Cost-sensitive Learning Problems
3	Research On Online Learning Algorithms For Drifting Imbalanced Data Stream
4	Research On Direct Optimization Of PAUC Algorithm For Large-scale Data
5	Research On Potential Home Broadband User Identification Problem With Large Scale Imbalanced Datasets
6	Learning in extreme conditions: Online and active learning with massive, imbalanced and noisy data
7	Research On Imbalanced Classification And Its Coupling Relationship Based On Reinforcement Learning
8	Research On Binary Imbalanced Large Data Classification And Its Application
9	Research And Application Of Extreme Learning Machine Algorithm Based On Imbalance And Online Learning
10	Imbalanced Classification Methods Based On Extreme Learning Machine And The Application