Font Size: a A A

Research On Telecom Feeset Matching Problem Based On Ensemble Learning

Posted on:2020-03-27Degree:MasterType:Thesis
Country:ChinaCandidate:J ZhaoFull Text:PDF
GTID:2439330578953313Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
The increasingly fierce competition among telecom operators has led major operators to adjust their strategies to attract new customers and retain old ones.The telecom feeset can be said to be a kind of "service" provided by the operator for the customer.It is especially important to choose a suitable and affordable telecom feeset for the user,as this can improve customer satisfaction to a certain extent and thus increase customer loyalty.The huge customer base will be the killer for operators to stand out from the fierce competition.Operators seem to have seen this all the time,so there are a wide variety of telecom feeset on the market,which greatly increases the difficulty for customers to choose their own packages.Therefore,it is imperative for operators to choose the right package for their customers.This paper selects the user history data set published by China Unicom Research Institute to conduct research on telecom feeset matching.The data set contains a total of 11 packages,which are typical multi-classification problems,and there is a serious imbalance between the categories.This paper is based on the ensemble learning.Choose the most appropriate package for your existing customers by creating a matching model.Firstly,the theory and corresponding advantages and disadvantages of the algorithm used in the research are introduced in detail.After that,the data set is cleaned and descriptively analyzed by Python.Three kinds of filling methods are used for missing values:average interpolation,mode interpolation and KNN interpolation..K-means clustering of user consumption data variables.Analyze various user behaviors and classify users into "ordinary users","VIP users" and "super VIP" users.Operators can upgrade or match their existing Telecom feeset to more suitable Telecom feeset according to their user level.About feature construction and feature selection:Using the statistical information of existing features to reconstruct the data,seven new features are added on the basis of the original features.In order to better select the features suitable for modeling,after processing Three tree models were established on the dataset for feature selection,and the union of the top 30 features of each model's importance ranking was selected as the final selected modeling feature.In the establishment of the model,the relevant literatures are found to be mostly single-class models such as KNN,naive Bayes,and logistic regression.This paper mainly studies it with integrated learning theory and establishes random forest,Adaboost,and GBDT+RF+Adaboost model.The GBDT+RF+Adaboost model utilizes the Stacking combination strategy,with the RF and Adaboost models as the primary learners,and the GBDT as the secondary learner to fuse the classification results of the two primary learners,comparing the F1 of each of the three classifiers.The AUC value found that the classification performance of the three models was good,and the model after fusion improved the performance of the primary classifier to a certain extent.Then the above three signal classification models were established on the same data set,and the highest F1 value was found to be 0.6607.The minimum F1 value of the three integrated learning models is 0.8016,indicating that the integrated learning method is more suitable for the study of Telecom feeset matching problems.
Keywords/Search Tags:Telecom feeset matching, random forest, Adaboost, GBDT, Stacking Combination strategy
PDF Full Text Request
Related items