Font Size: a A A

The Study For Fraud Detection Of Credit Card Based On Imbalanced Data

Posted on:2018-06-09Degree:MasterType:Thesis
Country:ChinaCandidate:M HuangFull Text:PDF
GTID:2359330518990338Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
With the continuous improvement of the economy and information technique of the world, more and more people begin to use credit card for transaction. Especially in Europe and America, the credit card has become a very important payment method in modern life and the industry of credit card has been well established. In China,although the period for the development of credit card is relatively short, this industry develops quite rapidly. The overdraw consumption of credit card has become a new pattern for consumption in China and probably become a main consumption pattern.With the scale for the usage of credit cards enlarging rapidly, the credit card fraud has become a serious problem, which can't be ignored any more. The credit card fraud not only make the commercial banks suffer huge economic loss, but also make them lose large consumer resources. The credit card fraud exerts considerable influence on risk control for commercial banks and hinders the normal development of financial system seriously in China. So it is urgent for us to find an appropriate method to monitor and identify the credit card fraud. Data mining is maturing gradually, and it is a new researching trend to use data mining and machine learning to identify frauds.In this paper, we use the transaction data during two days from Europe Credit Card Centre, building model to identify and monitor frauds. We deal with the problem of imbalanced data which is caused by the small amount of fraud transactions from the data perspective and algorithm perspective respectively. As to the data perspective,we use the SMOTE sampling method to increase the number of frauds till the amount of fraud cases is as many as that of normal cases and then we use the logistic regression to build model. From the algorithm perspective, we introduce the cost adjustment function and build the whole model through giving the different categories different weights. We adopt the improved algorithm of AdaBoost -AdaCost to build a model. Finally, we adopt AUPRC (Area Under the Precision and Recall Curve) and AUROC (Area Under the Receive Operating Characteristic) to evaluate the models.We find that no matter from data perspective or algorithm perspective, both the two models we build can identify the frauds well and the effects of two models are basically the same.
Keywords/Search Tags:credit card fraud, imbalanced data, SMOTE, logistic regression, AdaCost
PDF Full Text Request
Related items