Font Size: a A A

User Churn Prediction Analysis In Music Streaming

Posted on:2019-04-26Degree:MasterType:Thesis
Country:ChinaCandidate:H F LongFull Text:PDF
GTID:2417330566475735Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
With the more competitive products poured into the market constantly,users have higher activeness to select their interested music streaming services,leading to the risk of increasing the loss of users on each platform.However,on the one hand,because the users of music streaming update data fasterly,a model does not produce permanent effects.On the other hand,for the data sets of different industries and different characteristics,the user loss early-warning model has different manifestations and cannot be generalized.Therefore,in order to ensure the real-time nature of the user churn warning system,new research must be conducted on new services and new data sets.To sum up,it is necessary to conduct research on the prediction of user churn in music streaming media to ensure accurate forecasting of the loss of paying users in the subscription business,which is crucial to the long-term success of the platform.In view of the above issues,taking into account the needs of certain music streaming service platforms for the prediction of the churn of paying users,this paper selected the user record data generated by the platform from January 1,2015 to February 28,2017 as the original data.The relevant methods of data mining are used to study whether the subscriber will be churn within 30 days after the current membership expires.In terms of research content,this paper mainly discusses two aspects of the analysis of the churn prediction model,the reason of user churn and the analysis of the user segmentation model from two perspectives of classification analysis and cluster analysis.First,in the analysis of the churn prediction model.Using Python software to explore the characteristics of the music streaming media data from the initial cleaning of each data set,according to analying the univariate and multivariate comparison features.Combined with the results of the feature analysis,the cut-off dates and 6 time windows for dividing the training set and test set are set.Then,based on the RFM model,six different time windows,and statistics of the original features,the rigorous and detailed feature engineering work was performed to generate features that have an impact on the churn prediction.For feature redundancy problems that may exist,feature selection is performed by training the XGBoost model.In the modeling phase,the model is trained using a 5-fold stacking strategy.That is,in the first level,the five base models are trained using Random Forest,ExtraTrees,and GBDT models respectively;the results of the first prediction are used as inputs to the second-level model XGBoost model tointegrate.The AUC value of the model evaluation reached 0.878,indicating that the prediction achieved better results and had a certain practical value.Compared with the prediction effect of the base model,the prediction accuracy was improved.Secondly,comparing the model prediction results of original feature set and the new feature set,we get the conclusion that the detailed feature engineering can effectively improve the accuracy of the prediction model.Finally,in terms of user churn reasons and user segmentation model analysis,K-means clustering algorithm is combined with PAC dimensionality reduction visualization technology to subdivide users of music streaming media,and users are divided into four categories.The study found that the user value segmentation of the extended RFM model can overcome the shortcomings of the monotony of traditional RFM model indicators.
Keywords/Search Tags:user churn prediction, two-stage stacking, GBDT algorithm, XGBoost algorithm, user segmentation
PDF Full Text Request
Related items