| With the closer relationship between the Internet technology and the financial industry,payment methods have also been comprehensively upgraded,which greatly improves the convenience of payment for users.Online payment represented by thirdparty payment service has become one of the mainstream consumption modes.However,with the increasing size of payment orders and the rising transaction amount of online payment,the risk types of online payment are becoming more and more diversified.Risk identification has become a research trend in the field of payment in recent years.Online payment data set has many characteritics,such as large data scale,high dimension,extremely unbalanced distribution,long-dependence and so on,which are the difficult points to be solved in risk identification research.This paper aims to solve the two difficult problems of unbalanced data distribution and long dependence of data set.It solves the imbalance of data set distribution from the perspective of data set by coming up a new smote method and solves the long dependence of data set from the perspective of model construction.First of all,aiming to solve the problem of unbalanced distribution of online payment data sets,this paper proposes an improved boundary oversampling technique GMM-BSMOTE method to expand the minority samples.This method takes the distribution of minority samples into account and attaches importance to the generation of boundary samples.The experimental results on different balanced data sets show that this method has obvious advantages in improving the classification performance of the classifier,and it is still effective in the case of high imbalance.Secondly,aiming at the long dependence characteristics of online payment data set,this paper constructs LSTM-AdaBoost risk identification model.In this paper,the effect of the model is verified through the real online payment data set.On the data set processed by GMMBSMOTE,the paper compares the results of LSTM-AdaBoost risk identification model with the results of other common classification algorithms.The results show that LSTM-AdaBoost model can effectively identify risks,especially Recall,F1 and AUC index.The paper also compares the results of LSTM-AdaBoost based on different data sets proposed by different sampling methods.The results show that GMM-B SMOTE can effectively improve the risk identification effect of LSTM AdaBoost model. |