| With the development of information technology and the popularization of network behavior,electronic payment has gradually become one of the closest network behaviors in people’s daily life.While the networking of payment methods brings convenience to people,it also increases the risk of fraudulent transactions.Although credit card fraud only accounts for a small part of the total amount of credit card transactions,it will still cause huge losses to banks.If not controlled,credit card fraud will also cause investors to lose confidence in the financial system,resulting in a run.Eventually,it will even threaten the security of the entire national financial system and increase the national systemic risk.Therefore,credit card fraud identification is very important in the whole credit card transaction system.Credit card fraud usually refers to unauthorized purchase behavior by illegally using others’ credit cards,stealing information from others’ credit cards,or stealing money from others’ bank accounts.At present,there are two main ways to combat fraud-fraud prevention and fraud identification,and two supporting systems-fraud prevention system and fraud detection system have been developed.The fraud detection system is divided into a rule-based system and a system based on big data technology.The rule-based system is a system where experts summarize experience and set rules based on past transaction records.There is obvious lag,and the rule storage is large and it is not easy to update.Therefore,the fraud detection system based on big data technology is favored by banks.This paper takes the open credit card transaction data set as the research object,and analyzes the characteristics of the credit card data set,as well as the problems that may arise from the modeling of the unbalanced data set and the causes of the problems.Improved depth learning algorithm under random integration framework.A stochastic integrated neural network model is proposed and an empirical analysis is carried out on the credit card data set to verify the effectiveness of the model.The specific research conclusions are as follows:Firstly,analyze the credit card transaction data set,and analyze the problems and causes of the imbalanced credit card data set pair modeling through the feature distribution diagram and the relationship diagram between features.It is found that the good classification performance of machine learning and deep learning models depends on the separation trend of feature distribution fitting curve,that is,such features have a large feature contribution.However,due to the large gap between the sample sizes of two types of data in the unbalanced data set,even if the curve trends are different,the large type of data can still completely cover the small type of data,so the model cannot learn the effective features of the two types of transactions during modeling.Then,based on Borderline-SMOTE upsampling method,this paper fills the dataset with different sampling proportions,and analyzes its feature contribution and the relationship between features.It is found that the up sampling method can centralize the contribution of features and weaken the contribution of secondary features.On the premise of not changing the trend direction of the fitting curve of the feature distribution,sampling can increase the separation degree of the feature distribution,thus improving the learnable type of the dataset itself.And with the increase of sampling proportion,the degree of separation will be enhanced.After analyzing the sampling method,this paper selects several comparison models and experimental models to conduct empirical modeling analysis on the original data set and three sampling data sets with different sampling ratios.The results show that,with the increase of sampling proportion,the sample distribution in the dataset can effectively affect the overall preference of the model.Based on different sub network structures,the model proposed in this paper achieves the best results in the identification of fraudulent transactions and the comprehensive identification performance of the model.Moreover,the big data model based on the optimization condition of minimizing loss function is easily affected by the sample distribution of training set,and the model with tree structure is less sensitive to the imbalance of samples.But the model based on error training can achieve a higher ability to identify fraudulent transactions.In addition,the appropriate model should be selected according to the classification difficulty of the dataset.In the face of highly overlapping data sets such as credit card transaction data,linear support vector machine and logistic regression are subject to the computing power of their own models,and the increase of sampling proportion will only change the overall preference of the model,which cannot promote the model to further effectively learn fraudulent transactions.Finally,a new idea of combined model detection is proposed for the issuer,combining the high accuracy of the stochastic integrated neural network with the high accuracy of the stochastic forest model.That is,first use the stochastic integrated neural network to predict the samples suspected of fraudulent transactions,and then use the stochastic forest to screen and delete the samples of those FPs.Finally,the retained samples will enter the next manual review stage,which can greatly reduce the human cost. |