Font Size: a A A

Box Office Prediction Based On XGBoost Algorithm

Posted on:2021-05-26Degree:MasterType:Thesis
Country:ChinaCandidate:M YangFull Text:PDF
GTID:2415330626961126Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
With the development of society and the improvement of people's living standards,cultural entertainment has became the main way for people to release pressure.The rapid development of the film industry has also made it an important part of cultural entertainment.Unfortunately,the proportion of domestic successful films is very small,and the vast majority of films each year are difficult to recover costs.The prediction of the box office is particularly important.We selected factors that have a greater impact on the box office,and established a predictive model with better performance to provide data support for movie production and publicity.This paper uses ensemble learning to study the box office.Through two data sets,it is verified that the XGBoost algorithm is superior to the random forest algorithm and the GBDT algorithm.We select the dataset for the box office prediction of the movie in the Kaggle competition as the first dataset.In the data pre-processing part,this article quantifies some factors whose values do not change with the movie into specific values,and other factors directly quantify into dummy variables.Then use XGBoost to select features,give feature importance ranking,delete redundant attributes,and reduce model complexity.Through the evaluation index,the model with relatively high prediction accuracy is selected.Finally,crossvalidation and grid search are used to adjust the model parameters to improve the prediction accuracy of the model.We select the relevant data of 56 movies in the top 100 domestic box office rankings in 2019 as the second dataset,then build a model.Through the evaluation index,the XGBoost model can more accurately predict the box office.
Keywords/Search Tags:Box Office, Random Forest, GBDT, XGBoot
PDF Full Text Request
Related items