Font Size: a A A

Research And Implementation Of Abnormal Telephone Identification Method Based On CDR

Posted on:2023-03-17Degree:MasterType:Thesis
Country:ChinaCandidate:G L C ShangFull Text:PDF
GTID:2568306809471044Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Communication technology has been developing rapidly,and various chat tools have emerged in an endless stream.Mobile phone calls have always been regarded as the most convenient and efficient way of communication.With the advent of the era of big data,users’ private information is leaked and used by criminals,resulting in a large number of fraudulent,harassing,and other abnormal calls.Reports indicate that in 2020 alone,1.5 billion calls were marked as harassing calls.These types of calls have caused great distress to mobile phone users,and may even lead to loss of life and property.The operator has always assumed the intermediate role of user calls and collects the user’s call information.If these data can be used,appropriate data mining technology is used to establish a model for identifying abnormal calls actively,and these abnormal calls can be identified in advance.And intercepting,it will definitely create a good call environment for the society and mobile phone users,and is conducive to the construction of the operator’s brand image.Based on this research background and needs,this thesis mainly does the following three research work:(1)Feature construction on the original CDR record.Select the original fields that are beneficial to modeling from the original CDR(Call Detail Record),analyze from four aspects:Call duration,Call frequency,Call initiation time period,and Call location,to construct 17 call features can fully reflect the difference between ordinary calls and abnormal calls,including Call frequency and Total call duration,.The rationality and applicability of these 17 features are verified by Correlation analysis and Feature importance analysis.(2)A cluster-based denoising sampling algorithm is proposed to denoise the dataset and balance the number of positive and negative samples.The data provided by operators have two problems:on the one hand,there are a certain number of meaningless samples in the data set,on the other hand,the difference between the number of abnormal calls and ordinary calls is too large.The traditional SMOTE(Synthetic Minority Oversampling Technique)sampling algorithm cannot eliminate the interference of noisy data.Therefore,this thesis proposes a denoising sampling algorithm based on clustering,which can well balance positive and negative samples while removing noise.The data set processed performs well in the classification model,and the accuracy,precision,recall,F1 value,AUC value,and G-mean value of the model are all optimal,which proves the data processing proposed in this thesis is rationality and superiority.(3)Improvements to the traditional Stacking model.After analyzing the disadvantages of the traditional Stacking model,this thesis proposes an improved Stacking algorithm based on feature weighting.First,the four classification algorithms are selected as the primary learner in the Stacking integration algorithm,including Decision tree,Random forest,XGBoost and SVM.Logistic regression is selected as the main algorithm of the secondary learner.Accuracy weighting is performed to obtain the improved Stacking algorithm.Finally the performance of the stacking algorithm before and after the improvement is compared through experiments.The experimental results show that under the premise of inputting the same data set,the AUC value of the improved Stacking algorithm is 2.7%higher than that of the traditional Stacking algorithm,and 4.7%higher than the AUC value of the best-performing XGBoost among single algorithms.The stacking algorithm has a recognition accuracy of more than 90%for abnormal phone calls,and has good stability and generalization ability.The original CDR is constructed by the original features,denoised and sampled,and then input into the improved stacking algorithm for training.The accuracy rate of the established abnormal phone identification model is 92.9%,the accuracy rate is 94.4%,the recall rate is 91.8%,the F1 value is 93.1%,and the AUC value is 92.9%.This shows that the model in this thesis can accurately and actively identify abnormal phones,and has certain advantages in performance compared with the single model and the traditional model.It provides a certain reference for operators to manage abnormal calls and create a green call environment in the future.
Keywords/Search Tags:Abnormal call, Feature construction, Improved Stacking, Ensemble learning
PDF Full Text Request
Related items