Font Size: a A A

Research On Movie Box Office Prediction Model Based On Network Data

Posted on:2020-06-11Degree:MasterType:Thesis
Country:ChinaCandidate:N LiuFull Text:PDF
GTID:2405330575952047Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
As one of the main entertainment activities in people's daily life,the film not only satisfies the spiritual needs of the audience from the perspective of the product,but also promotes the rapid advancement of the cultural economy from the social level.In 2018,the overall box office of Chinese movies broke through the 60 billion mark.Under the rapid development of the film culture,the film industry represented by the United States has become increasingly mature in terms of management methods and marketing means.Domestic films have also entered a new period of development after the stage of introduction,digestion and innovation.However,with the continuous development of the economy and culture,the audience's requirements for movies have become higher and higher,which directly leads to the increasingly serious competition in the film market.Under the joint effect of the market and the audience,the film industry inevitably has some losses due to the high investment but the bad box office.Therefore,by means of forecasting,we can quantify the comprehensive factors in the early stage of design,production and operation of a film,and then analyze the expected box office performance,thereby adjust the operation process of the film dynamically to reduce the investment risk and guide the positive development of the film,which is beneficial for film investors and the social economy.The development of big data,predictive models,machine learning,etc.,provides data and analytical theory support for box office prediction.In this paper,firstly,we obtain the corresponding data of the professional movie information statistics websites such as Time Network,Cat Eye Professional and China Box Office Database through reptile method,and then films with a box office receipt of more than 100 million from January 2015 to December 2018 were selected as the basic data of this study.Data indicators are basic information that can be obtained in the film section,such as director,actor,movie type,etc.Secondly,the data is pre-processed including missing value supplement,data set integration and index variable.Among them,the characteristics of each data indicator are combined with the actual factors such as the director and the actor to quantify the index,and the classification indicators such as the movie type and the release period are converted into virtual variables.Thirdly,an exploratory analysis of selected variable indicators was conducted to study their visual impact on the final box office income.Next,models are established based on the processed data set.The first is a multivariate regression model,which models all variables and removes the insignificant variables in the model.Considering that the data set in this paper is cross-sectional data,the model may have heteroscedasticity,and the heteroscedasticity needs to be tested and eliminated,and finally a better regression model is established.The second is a random forest model,which requires statistical methods to determine the two parameters of the trees in the model and the number of variables of the binary tree in a particular node.After the parameters are determined,the model can be established,and the importance of the variables in the model can be ranked,and the contribution of each index to the model can be observed from another view.The third is the BP neural network model.Based on the previous research and the dataset of this paper,the number of input and output layer nodes and the number of hidden layers and nodes in the network model are obtained,and the model activation function is selected,and a BP neural network is established.Finally,in the established models,the box office reliability indicates that the random forest analysis results have the best accuracy,and the neural network analysis results are second.According to the ROC curves obtained by the models,among the two analysis results with better prediction accuracy,the AUC value obtained by random forest analysis is 0.84,which is the closest to 1,which is obviously better than the neural network.Based on the above two points,the random forest model is the optimal predictive analysis model of this paper.
Keywords/Search Tags:Box office prediction, Regression model, Random forest, Neural networks
PDF Full Text Request
Related items