Font Size: a A A

Fraud In Advertising And Banking Based On Feature Engineering And Mean-uncertain Logistic Regression

Posted on:2024-04-14Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y ZhangFull Text:PDF
GTID:2569306923474994Subject:Financial mathematics and financial engineering
Abstract/Summary:PDF Full Text Request
With the development of economy and technology,the risk of fraud in various fields is gradually rising.In order to avoid fraud,a large number of data warehouses have been built and huge amounts of data have been stored,but from a large number of highdimensional data to solve the problem of fraud,it is also necessary to build an efficient and reasonable fraud recognition model.This paper selects two typical fraudulent datasets,advertising fraud and bank credit card fraud.On the basis of traditional derived feature engineering of the dataset,the new way of derived feature by sliding window and the uncertainty of the mean is used to derive new features for the dataset and make better use of the information implied in the dataset.After that,this paper explores the suitable scope of imbalance of mean-uncertain logistic regression.Logical regression,mean-uncertain logistic regression,XGBoost and LightGBM are used to build a model of fraud recognition.Based on the architecture of the fusion model,according to the principle of algorithm difference,logistic regression,mean-uncertain logistic regression,XGBoost and LightGBM are combined by weighting method,performance evaluation method,grid search method and stacking method to construct four combination models E-LR-unLR-LGB-XGB,C-LR-unLR-LGB-XGB,GLR-unLR-LGB-XGB and S-LR-unLR-LGB-XGB.In addition,in order to compare the effect of new derived feature on fraud recognition.Two groups of single models and combined models are constructed for datasets with or without derived feature.Finally,on the dataset with(without)derivative features the effects of models are compared,and the effects of single models and combined models are compared,the results show that the accuracy,precision,recall,AUC and comprehensive performance score of four single models on advertising fraud datasets with derived feature are 3.50%,3.04%,4.01%,0.0426 and 0.0411 higher than those without derived feature,respectively,in the combined model,3.45%,2.68%,4.72%,0.0391 and 0.0440 were increased on average;in the bank credit card fraud dataset,20.41%,19.30%,16.57%,0.1299 and 0.1514 were increased on average for the single model,on the combined model,the metrics increased 27.43%,12.49%,10.5%,0.1462 and 0.1218.The recall and comprehensive performance score of the best combination model higher than the best single model on balanced and unbalanced datasets,which is robust and reflects stronger fraud recognition ability.To sum up,the theoretical analysis and empirical results show that:in the two datasets,the combination model with new derived feature proposed in this paper has better fraud recognition ability.
Keywords/Search Tags:Fraud recognition, Machine learning, Feature engineering, Nonlinear expectations
PDF Full Text Request
Related items