| With the development of information technology and the maturity of communication technology,the choice of operators and services available to communication users has become diversified,and the market competition is particularly fierce.While expanding new subscribers,retaining them will become a key concern for operators in the face of future development.How to accurately and effectively identify the tendency of subscribers to stay and go,so as to reduce subscriber churn is of great importance to the communications industry.This paper therefore uses data mining techniques to study and improve the early warning model of communications subscriber churn,with the aim of building an effective predictive model to apply to practical problems.Based on the research background and current status of the research on communication subscriber churn,this paper first learns the algorithmic model based on the theoretical foundation and acquires the relevant implementation techniques.The paper then introduces the dataset used in this paper and conducts an exploratory analysis,dividing all feature variables in the dataset into three aspects: user behavioural attributes,user attributes and product attributes to explore the impact of each feature on communication subscriber churn and to give certain analysis suggestions.The analysis is combined with data cleaning and feature selection to complete the data preparation work before modelling.In the study on the prediction model of communication subscriber churn,a single algorithm of Logistic Regression,Support Vector Machine and Decision Tree is used for prediction and classification,and a combination of strategies is used to make a "Logistic Regression + Support Vector Machine + Decision Tree" model with slightly better results.This was followed by an integrated algorithmic model using Random Forest based on Bagging and Ada Boost based on Boosting for predictive classification.The final evaluation metrics were used to compare a fused single algorithm model and two integrated algorithm models under the initial processing dataset.The accuracy of the models were both above 81% and the AUC values tended to be 0.86.The combined results showed that the integrated algorithm Ada Boost model worked best,followed by the fused single algorithm model.Based on the research in the previous section,improvements to the fusion GAN are proposed to further investigate the communication subscriber churn prediction model.Firstly,the data is processed using GAN to construct generator and discriminator network models for training to generate a new training set and improve the data set imbalance problem.The GAN-based communication subscriber churn prediction model is then constructed and tuned using an improved solution for the fused single algorithm model and the integrated algorithm model respectively.The accuracy of the fused single algorithm was 89.5%,the accuracy of the two integrated algorithms was above 93%,and the AUC values were above 0.94.Other indicators were also significantly improved compared to the previous ones.And four oversampling methods are introduced to compare with GAN method,which proves the feasibility of GAN method under this problem.The final Random Forest model incorporating the GAN improvement strategy achieved the best results.Through this study,the accuracy of the model is improved and the optimal model is selected to help operators effectively predict potential churn and provide decision support for the churn problem. |