| In recent years,the swift advancement of mobile communication networks and intelligent terminal devices has fostered a close-knit relationship between people’s daily lives and these networks.As people relish the conveniences brought by mobile network services,the number of telecom fraud cases is soaring globally,inflicting substantial losses on individuals,businesses and even threatening social stability.Consequently,employing technical methods to effectively detect fraudsters is of paramount importance.However,the current situation of telecom fraud detection is daunting.On one hand,with the relentless enhancement of fraudsters’ anti-detection capabilities,previous detection methods are struggling to sustain their effectiveness.On the other hand,most current detection methods treat users as independent entities and overlook their communication relationships,which results in unsatisfactory detection outcomes.Graph,as non-Euclidean structured data,can adeptly depict and model entity relationships.They have been extensively applied in fraud detection domains such as telecommunications,finance,and social networking in recent years.The interactions between mobile communication network users inherently possess relational attributes,so representing communication data as a graph can effectively retain user behavioral information and aid in detecting telecom network fraud.Moreover,graph machine learning technology has achieved significant breakthroughs in recent years.It boasts a robust capacity for expressing complex graph structure data,enabling the extraction of hidden information from nodes and edges within the graph.So,it can map nodes to low-dimensional spaces,facilitating subsequent tasks such as classification and clustering.Furthermore,it enables simultaneous semi-supervised and unsupervised learning.Hence,utilizing graph machine learning technology for telecom network fraud detection has emerged as a highly promising research area.However,after investigating the existing research,it is found that the detection of telecom network fraud based on graph machine learning technology faces the following key challenges: 1)Owing to the utilization of various mobile phone applications,the graph generated from phone calls and text messages between users exhibits significant sparsity,making it difficult to provide sufficient neighborhood information for mining fraudsters;2)Besides numerous front-line fraudsters,there are behind-the-scenes fraudsters operating at different command levels within fraud gangs,posing a substantial threat and proving difficult to detect;3)Fraudulent users consistently represent a tiny fraction of all users,leading to graph imbalance that may cause a shift in the graph machine learning model and severely impact fraudster detection efficacy;4)In the face of differentiated and unbalanced graph data,devising an adaptive learning detection model to achieve optimal fraud detection performance remains a challenge.To tackle these challenges,this paper analyzes the underlying causes,abstracts the corresponding scientific issues,designs a targeted graph neural network model,and conducts comprehensive research on key technologies for telecommunications network fraud detection.The main research findings are as follows:1.To solve the problem of fraud detection in sparse communication connection graph data,this paper proposes an end-to-end telecom fraud detection framework based on user behavior graph reconstruction.The whole framework can be divided into three parts: feature extraction module,graph reconstruction module and graph neural network module.First,statistical and pattern feature extraction is performed on Call Detail Records(CDR)metadata,and the original features are transformed;then,features are screened and a graph is constructed according to user behavior similarity;finally,node features and topology are input into graph convolutional neural network for node representation learning.Extensive experiments on real-world sparsely connected CDR datasets demonstrate the effectiveness of the proposed method.The proposed framework builds a bridge between sparse graph data and graph machine learning,and can be used in other anomaly detection scenarios with absent or sparse graphs.2.In order to solve the problem of behind-the-scenes fraudster detection in telecom network fraud,this paper reveals the phenomenon of behind-the-scenes fraudsters in real-world telecom data,and designs an ensemble graph neural network model for detection.First,multiple graph attention networks are used as base classifiers to learn fraudster features of different depths,where inner-layer attention is used to learn the attention weight function of node neighbors,and interlayer attention is used to realize the difference between different attention networks.Subsequently,the ensemble learning algorithm SAMME.R updates node weights between layers,allowing misclassified nodes in the current base classifier to have greater weight in the subsequent classifier.Finally,multiple base classifiers are combined to learn node embeddings for node classification.Experimental results on two real-world telecommunication network fraud detection datasets demonstrate that the proposed model effectively detects behind-the-scenes fraudsters in telecommunication networks and outperforms current state-of-the-art baseline methods.3.To solve the problem of fraud detection in unbalanced graphs of telecom networks,this paper proposes a cost-sensitive ensemble graph neural network model,which guides the model to focus on minority nodes in the graph by introducing cost-sensitive learning.Firstly,the graph attention network is used as the base classifier to learn node embeddings,which are then input into the corresponding cost-sensitive learner to calculate misclassification costs and update node weights in the subsequent base classifier accordingly.Next,the weights are employed to constrain the loss function,guiding the training process of the base classifier.Finally,the cost-sensitive embeddings from all base classifiers are aggregated to output the predicted node class.Extensive experiments on two real-world telecom fraud detection datasets show that the proposed model outperforms multiple state-of-the-art baseline methods and can effectively handle graph data with imbalanced class distributions.4.To solve the problem of fraud detection in differentiated unbalanced graph data,this paper proposes a cost-adaptive graph neural network model,which realizes diverse unbalanced graph data learning through unbalanced neighbor sampling and cost matrix adaptive learning.Firstly,a reinforcement learning-based neighbor sampler is designed,and an appropriate sampling threshold is trained using a reward-punishment mechanism.Next,neighbors are sampled based on the similarity between the neighbor node and the central node,thus initially mitigating the graph data imbalance problem.Subsequently,the obtained sampling image is input into the cost-sensitive learner,utilizing GNN for message aggregation to acquire node embeddings.The optimization target is constructed with the sample distribution matrix,node embedding scatter matrix,and classification confusion matrix,employing the backpropagation algorithm to jointly optimize the cost matrix and GNN.Finally,the derived cost-sensitive node embedding representation is used for fraudulent node detection.In addition,this paper also theoretically proves the effectiveness of the combination of gradient descent-based automatic cost matrix learning method and GNN.Extensive experiments on two real-world telecom network fraud detection datasets demonstrate that the proposed method can effectively deal with the problem of adaptive learning on differentially unbalanced graph data.In addition,considering the scarcity of relevant research resources,this paper reorganizes and releases two real-world telecom network fraud detection datasets derived based on internet public resources.These datasets are adapted to the deep graph learning framework,Deep Graph Library(DGL),along with 10+ state-of-the-art graph machine learning algorithms and fraud detection models.The datasets and source code corresponding to the four research points of this paper can be found at: https://github.com/xxhu94. |