| In recent years,with the development of internet finance,the illegal pyramid scheme(or Multi-Level Marketing,MLM)have broken through the limitations of traditional regions and social relationships showing an explosive growth and expansion,which seriously threatened the economic and property security of the public and grately undermined the financial environment security of the country.As a type of collective fraud,the illegal pyramid scheme is complex in its mode of organizational operation with characteristics like the strong capability of anti-investigation.However,the widely used detection methods,mostly rely on intelligence clues,data retrieval,and manual analysis,are far away from reaching the requirements of supervising the illegal pyramid schemes of the government.Therefore,using data analysis technologies into solving the task of discovering the pyramid schemes in different investigation scenarios is an important research topic both in theoretical and practical perspectives.From the real investigation requirements for clue supplementation and evidence discovery,this paper conducts a series of exploring researches in discovering pyramid schemes by using financial big data.Considering the four levels of behaviors of pyramid schemes,which are individual,organization,the trading process,and the trading pattern,the related problems are abstracted into a number of knowledge discovery tasks detecting the pyramid scheme behaviors by analyzing the trading features hiden in financial data.The main research works and innovations of this paper include:1)For the problem of identifying individuals of MLM,two node identification methods for detecting MLM accounts on financial social network and financial transaction network(Fi TNet)are proposed,the two methods are same in focusing on measuring the attributes of node relationships but different in the way of generating the classification features.To quantify the node similarity by using their social relationship information,we propose a node feature learning algorithm based on measuring multiple attribute-level similarity,mapping the nodes into multiple independent hash spaces simultaneously.By introducing supervision information and transition subspace into the learning process,the algorithm obtains a high ability in embedding the semantic category information.The experimental results on public and MLM data sets illustrate its high accurate in measuring the similarity between MLM individuals.For identifying MLM indeividuas on Fi TNet,we firstly propose a method to quantify the containing two kinds of transaction attribute informations,which are transaction process information and transaction contents.Secondly,a node classification algorithm based on joint measuring the binary heterogeneous attributes is proposed,in which a two-route convolutional neural network is developed for extracting each attribute independently and optimizing classification features simultaneously.Finally,the experimental results show the generated features are comprehensiveness in measuring the similarity of MLM nodes and achieves a high accuracy in detecting MLM transaction accounts.2)For the problem of detecting the whole organization of an MLM,this paper proposes an MLM organization complementation method based on structure embeddings of the subgraphs to discover the MLM groups,considerring the structural characteristics of each MLM group which are generated because of the internal transaction relationship.Under the control of the operating mechanism of an MLM organization,the behavior within each group shows the character that nodes in different roles connects with each other in specific strucures.The original problem is abstracted into the problem of identifying subgraphs with the same structure in Fi TNet.This paper devises a sub-graph structure feature learning algorithm by which the both global and local strucures information of an MLM are embeded into the classification feature vector.The experimental results of comparing our algorithm with other node-level and sub-graph level methods demonstrate the higher performanc of our algorithm in expressing the strucutre features of MLM groups.3)For the problem of extracting evidences for MLM judgment,this paper proposes an MLM trading path discover method based on an optimized ant colony model,which in result provides auxiliary information for judging the fraud degree of an individual.The method is devised by considering the overall trading behavior of an MLM group that the fund converges from the bottom of an group to its top,and most of the fund is transferred by a certain part of trading paths.This paper considers the path that carries the main capital circulation in Fi TNet as the MLM trading path,and abstracts the problem into extracting the trading path that remains the major trading function of the orginal network.Inspired by ant colony model,the foraging process is used to fit the fund transaction process to discovering the main path.To simulate the fund amount carried by the path by utilizing the volume of pheromone accumulated on the path,the ant colony model optimizes its path transfer mechanism and pheromone update mechanism specifically.The experimental results verify the feasibility and effectiveness of our method through a case study of MLM path discovery.4)For the problem of checking the existence of an MLM,according to the transaction characteristics of MLM,this paper proposes a pyramid schemes trading pattern mining model based on comparative frequent conditions to provide seed clues for MLM investigation and analysis.Mining trading patterns that comply with the MLM transaction rules from the transaction history data is a challenge task,for the major two reasons: On the one hand,MLM transaction records belong to a minority data set meaning that they are sparse in data set and unable to filter out by using absolute frequency thresholds;On the other hand,special mining strategies for mining MLM transaction rules are needed cause the rules are complex and highly abstract.In response to the above challenges,a sequence do-noising algorithm based on folding fuzzy subsequences is proposed to diminish the search space of the subsequent pattern mining process,the algorithm reduces the sparsity of the potential trading patterns while remaining the sequence structures.Given the de-noised sequences,this paper proposes a sequential pattern mining algorithm based on the comparative constraints which are formalized by comparing the sequences with two introduced comparison samples.The experimental results of pyramid schemes trading pattern mining task show that mojority baseline methods are almost ineffective in mining MLM related pattern items,while our model finds out both trading data items and trading patterns effectively. |