| Under the trend of the digital transformation of the financial industry,the prevention and control of credit risks is conducive to the healthy development of consumption and credit.In this process,it is necessary to vigorously introduce and develop scientific and technological means such as data mining,model algorithm and network security to enable business,and better give play to the important supporting role of scientific and technological forces in digital economy and financial regulation.Credit data sets collect transaction data from various dimensions in different scenarios,which are large in quantity and generally high in dimension.However,fraudulent behaviors occurring in reality are extremely rare cases,which are usually quite different from the features contained in normal behavior data.Therefore,they are regarded as outliers of important mining significance.Improving the ability to identify such malicious fraud outliers is a key step in credit risk control.In the related fields of outlier detection algorithms,the recognition of outliers in high-dimensional large data and unbalanced data sets has become two research hotspots.When the dimension of the data sets is high,the intuitive distance or density between the sample points is not enough to accurately reflect the degree of similarity,which affects the recognition effect of the algorithm.At the same time,since most machine learning algorithms rely on the assumption that data distribution is relatively balanced,the unbalance of data sets also brings some difficulties to the application of anomaly detection technology.In this paper,we use Matlab programming language to study outlier Detection in Credit Card Fraud Detection.Starting from the improvement of the data level,an undersampling based on sparse subspace clustering SSC algorithm is designed in Chapter 3.Under the premise of proving that SSC has a good clustering effect on highdimensional data,most samples are undersampled by combining with the idea of nearest neighbor,and the distribution of sample points is taken into consideration while the dimension of data set is reduced.Increased the rationality of the constructed new balanced data set.Then in the fourth chapter,on the basis of the above undersampling method,add the SSC-based improvement of SMOTE oversampling,balance the data set with mixed sampling,reduce the error caused by random or blind reduction and expansion of the data set,and alleviate the problem of poor machine learning algorithm application caused by the imbalance of the data set.The generalization ability of outlier detection algorithm is improved.Finally,SVM classification algorithm is applied to detect outliers in the balanced data set.The experiment verifies that the proposed algorithm has good recognition and prediction effect on outliers in the credit data set. |