Font Size: a A A

Research On Automatic Recognition And Analysis Of Telecom Fraud Based On Tree Model

Posted on:2024-01-05Degree:MasterType:Thesis
Country:ChinaCandidate:B T LiFull Text:PDF
GTID:2556307079492624Subject:Electronic Information·Computer Technology (Professional Degree)
Abstract/Summary:PDF Full Text Request
With the continuous advancement of information technology,information communication has ushered in a prosperous development,but with the rapid expansion of the telecommunications market,telecommunications fraud is also becoming more and more serious,which has brought huge economic losses to operators.At the same time,the explosive growth of data volume has also led to more complex work scenarios.Under such circumstances,it is becoming more and more difficult to identify and analyze telecommunications fraud users from massive data.How to reduce the workload and improve work efficiency for data analysts has become an urgent problem that needs to be considered and solved.This paper takes SIMBOX telecom fraud identification and analysis as the task scenario,focuses on integrated learning,automatic analysis,and fraudulent user portraits,and conducts the following research:Firstly,the construction of automatic classification model based on tree model is completed.From data reading,data preprocessing,to model training and prediction,these links are automatically processed.At the same time,this model realizes data reading of six different text types,four methods of filling missing values,two methods of discretization of continuous variables,and two discrete variable encoding method,Four integrated learning algorithms based on tree model,and can select the variables that need to be modeled according to their own specific needs,and set the positive and negative sample ratio by themselves.It can adapt to most classification scenarios and realize it through parameter adjustment.Tuning.Let data analysts no longer be freed from the tedious process in the middle,and spend more energy on the observation and processing of the data itself.Secondly,the automated analysis of SIMBOX telecom fraud user data was completed.Start observation and analysis from the original data,complete data cleaning and feature derivation,and compare four integrated algorithms,nine positive and negative sample ratios,using accuracy,precision,recall,F1 score,AUC score,KS The six dimensions of the score are used to evaluate the prediction effect of the algorithm to ensure higher model accuracy.Finally,the construction of the SIMBOX telecom fraud user profile is completed.Select 25 variables in the telecommunications fraud data,analyze the behavioral characteristics of fraudulent users from 13 dimensions,and construct fraudulent user portraits,so that operators can have a clearer understanding of fraudulent users and more targeted prevention of fraudulent behavior.
Keywords/Search Tags:Telecom Fraud, Data Analysis, Automated Modeling, Machine Learning
PDF Full Text Request
Related items