Font Size: a A A

Research On Fraud Number Identification Based On Communication Characteristics

Posted on:2022-12-30Degree:MasterType:Thesis
Country:ChinaCandidate:H Y JiangFull Text:PDF
GTID:2506306773993119Subject:Telecom Technology
Abstract/Summary:PDF Full Text Request
Telecom fraud,as the most frequent criminal fraud in today’s society,has spread to all parts of the country like a virus.For telecom operators,it is an important measure to take the initiative to prevent and control telecom fraud crime to identify the fraud number accurately and give early warning in time by technical means.However,it is a great challenge to mine the useful features from the massive communication information to construct the recognition model.Based on this,this paper starts from the real user detailed CDR data,firstly,on the basis of the existing research on call characteristics,further excavates its distribution characteristics and adds the feature structure of SMS and traffic dimensions,and perfects the multi-dimensional comprehensive fraud number identification feature system to improve the identification accuracy.Secondly,a variety of models(Random Forest,AdaBoost,XGBoost,LightGBM and CatBoost)are used for data training on the dataset constructed according to the above feature system,so as to verify the validity and feasibility of it.The prediction results on the test set show that CatBoost has the best performance in precision,recall and accuracy,and the feature interpretation based on it using SHAP framework also reflects the rationality of feature construction from the side.Finally,based on traditional Stacking methods,feature sampling and learner weighting are applied to model fusion to further improve prediction performance.Compared with the optimal basic model CatBoost,the precision can be improved by about 2%on the premise of keeping the recall rate basically unchanged.Compared with the traditional Stacking method,the prediction effect was better.In this paper,a multi-dimensional feature system is constructed for fraud number identification,and good results are achieved through multi-model training on offline data sets.The whole idea can provide a reference direction for online detection and timely warning of fraudulent numbers for communication operators.
Keywords/Search Tags:Fraud Identification, Feature Engineering, Model Merging, SHAP Framework
PDF Full Text Request
Related items