Font Size: a A A

Predicting The Popularity Of Online News

Posted on:2019-12-25Degree:MasterType:Thesis
Country:ChinaCandidate:J Y WeiFull Text:PDF
GTID:2427330545951179Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
The development of news is dynamic and its life cycle is generally short.It is important to spread to a large number of readers in a relatively short period of time if one news wanted to be a hot news.So some of its attributes must meet the needs of most users' interests.Therefore,it is important to extract and study the characteristics that determine the popularity of online news,analyze their relationship and finally establish the model to describe the relationship between the popularity of news and these characteristics.This paper selects the 47 feature dimensions of news and puts forward two classification problems that can predict the popularity of news.The analysis process mainly includes the following steps: data preprocessing,descriptive analysis,establishment model,model evaluation,and conclusion,The models I choose the logistic regression,CART decision tree,random forest and XGBoost,In the logistic regression,there are correlations between the variables,so I add penalty items to the target function eliminates the multiple co-linearity.I've tried L1 and L2 regular terms,and L1 has a better effect,In terms of model parameter setting,10 fold cross validation is used to determine the optimal parameters.In particular,the parameter setting of XGBoost also adopts grid searching method.In terms of model evaluation,accuracy rate,precision rate,full rate,F1 and AUC index were adopted,and all models were evaluated comprehensively,and the prediction effect of XGBoost was the best.Finally,the countermeasures to improve the popularity of news are obtained.
Keywords/Search Tags:Online news popularity, logistic regression, decision tree, random forest, XGBoost
PDF Full Text Request
Related items