| Massive transaction data has been accumulated with the vigorious development of online payment.How to accurately capture abnormal trading behaviors in the massive data and how to improve the efficiency of risk management are two huge challenges many online payment companies encounter.The online transaction data has such characteristics as batch arrival,high dimensions,data sparse,class imbalance,and class overlap,which is denoted as complex streaming data.This thesis focuses on the abnormal detection of complex streaming data.According to different application scenarios,the existing abnormal detection models can be divided into two groups: offline detection and online detection.Offline detection is adopted to static data,while online detection is applied to streaming data which is arrived one by one,but neither of them has been utilized to detect anomaly in complex streaming data.Meanwhile,catastrophic forgetting is one of most frequent phenomena in the streaming data,but it has barely been conerned in the task of anomaly dection.We carried out the first research on batch arrival and catastrophic forgetting for complex streaming data.In addition,the complex streaming data has serious class overlap.The prerequisite for subspace methods are contrary to the characteristics of complex streaming data,while the existing dimension reduction models are biased towards retaining the original data information which cannot handle the class overlap.Then it is the second research.The main work of this thesis includes the following aspects:(1)An abnormal detection framework SADEN is built for complex streaming data.The framework includes three components: hierarchical replay mechanism,feature representation module,and adaptive integrated model.The hierarchical replay mechanism continuously replays the past information to retrain the historical information from data.Based on the corresponding class label,different sampling strategies are designed in constructing exemplar sets so as to alleviagte disaster forgetting and handle class imbalance.The feature representation module uses the classifiers to generate a new feature vector instead of the original one,so that it can deal with sparse information when mining the transaction data.The adaptive integrated model designs an adaptive weight update mechanism that makes SADEN combine the strengths of offline and online detection.Empirical experiments shows that the SADEN framework respectively increases nearly 4 times in AUPRC and 1.4% in AUROC compared with the benchmark.(2)A hierarchical weighted Auto Encoder,denoted as HAE,is proposed.HAE model on one hand designs various mapping fucntions for each instances acording to their class label in order to increase the distance between the normal instances and the abnormal ones in a low-dimensional feature space,which aims to deal with class overlap problem.On the other hand,a new loss function is constructed by adding IV values as weights to the features in order that features which contribute more to the class labels receive greater attention when training the model.The experiments verifies that HAE effectively increase accuracy of the subsequent abnormal dection task.In summary,the thesis introduces how to dectect anomalies,and how to reduce the dimension of the feature space in complex streaming data.It is a prblem of great importance in the risk management in the online payment companies.Many numerical experiments verifies the effectiveness and generalization of the proposed model. |