Font Size: a A A

Research On Movie Box Office Prediction Based On Network Data

Posted on:2021-02-22Degree:MasterType:Thesis
Country:ChinaCandidate:M HanFull Text:PDF
GTID:2505306245981499Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
As a high-risk cultural industry,film has the characteristics of long production cycle and high cost.Film investors bear huge economic risks in investing in each film.Therefore,this paper establishes a movie box office prediction model to provide reliable risk and profit estimates for movies,which can not only help investors make investment decisions,but also form a benign evaluation mechanism in the industry to reduce the proliferation of "bad films".This paper collected a total of 839 movies from 2010 to 2019 as samples to establish a box office prediction model.First of all,under the background that "word of mouth" has become the primary motivation for people to watch movies,this paper uses an improved SO-PMI algorithm to construct a special emotional dictionary for Douban movie reviews,and calculate the positive and negative sentiment score of each movie as the box office prediction index based on 245277 effective comment data;Then,the indicators were selected and quantified from the four dimensions of basic movie information,star effect,word-of-mouth effect and first-week box office data,and data were collected from four data sources such as Yien.com,Shiguang.com,Douban.com and MaoYan Movie.com.In addition,this paper makes a descriptive statistical analysis of each indicator,and performs a Pearson test on the correlation between each indicator and the box office.Next,this paper cleans the collected data,establishes model evaluation indicators,builds multiple single models based on linear regression algorithm,random forest algorithm,GBDT algorithm,and XGBoost algorithm,and compares the prediction effects of different models;Finally,in order to further improve the prediction effect,this paper uses the Stacking algorithm to fuse the above four single models to build a two-layer Stacking fusion model,conducts a comparative experiment on different feature combinations to find the optimal combination,establishes a Stacking box office prediction model which is suitable for the normal operation of film box office production mechanism,and analyzes the prediction results of the fusion model.Through the above research,the research results of this paper are as follows: firstly,from the correlation analysis results,the correlation coefficient between the box office and the total box office in the first week of the movie is as high as 0.96,which has the greatest impact on the final box office.The correlation coefficients of the director’s historical scores,popularity,and box office are 0.69 and 0.27,which are higher than the correlation coefficients of the leading actor’s corresponding indicators and the box office,indicating that the director’s influence on the box office is higher than the actors.In addition,based on the results of the model evaluation indicators MAE,RMSE,and MAPE,adding positive and negative sentiment scores of film criticism can help improve the prediction effect of the model.The prediction effect of the whole index based on the basic movie information,star effect,word-of-mouth effect and first-week box office data is better than the index system composed of some features.Finally,the prediction result of the Stacking box office prediction model constructed by the multi-model fusion method is better than the single model.The prediction result of the Stacking model shows that the movie box office prediction error within 15% is accounting for 42.17% which is close to half,and prediction error within 25% is accounting for 84.3%.Overall,the error control is within the acceptable range,the model can provide a reference for movie investors,producers or theater management,so that it can adjust the work plan in time and minimize the box office risk.
Keywords/Search Tags:Box office prediction, Sentiment analysis, Correlation analysis, Machine learning, Model fusion
PDF Full Text Request
Related items