| With the continuous improvement of living standards,film as a cultural and entertainment project has brought huge social and economic benefits.The development of the domestic film industry faces many tests while facing new opportunities,such as asymmetric information,misleading public opinion and inadequate pre-publicity,leading to drastic fluctuations in box office revenues,which is undoubtedly a huge risk for investors and distributors,and also hinders the It also hinders the healthy development of the film industry.Therefore,studying the factors influencing box office revenues of domestic films and conducting forecasting studies on them is conducive to the film industry achieving a virtuous cycle with other industries and jointly promoting economic development.The main contents and conclusions of this paper are as follows.(1)Based on a review of relevant literature and theories,firstly,the methods of data collection,index selection and the construction and evaluation of prediction models are identified,and the micro aspects such as "the film itself,production and distribution,emotion and word-of-mouth" and the macro aspects such as "residents’ income and consumption,film industry development and epidemic environment.The key factors influencing the box office revenue of domestic films were analyzed from the micro aspects of "the film itself,production and distribution,emotion and word-of-mouth" and the macro aspects of "public income and consumption,film industry development and epidemic environment".Secondly,the Python crawler method was used to obtain the feature variables needed for the thesis research from websites such as Time.com,Yien Consulting,Cat’s Eye Movie Pro and the National Bureau of Statistics,and the raw data were cleaned and missing values filled in;One-Hot and other measures were chosen to code and quantify the variables.In particular,for the obtained review data,the sentiment analysis was carried out based on jieba word separation processing and the use of Chinese sentiment analysis library to explore the sentiment tendency of the review text,realise the sentiment extraction of the movie review data and construct the sentiment score index.After the above steps,the box office prediction index system(I)was initially constructed.(2)The box office prediction index system(I)was screened using the random forest method and the Bayesian model averaging method,and 12 key indicators were selected based on the importance and posterior probability of each characteristic variable,including "the number of people who want to see,the number of ratings,the per capita disposable income of residents,the consumption level of residents,the distribution company,the influence of the lead actor,the influence of the director,the total number of screens nationwide,the 3D/IMAX film format,the Chinese New Year slot,the epidemic factor and the sentiment score",to determine the box office prediction index system for domestic films.(3)In the empirical analysis section,firstly,based on the analysis of the box office revenue distribution of domestic films,the box office data were classified using the K-means method to classify box office revenue into three categories: high,medium and low according to the elbow rule.Secondly,the correlation between movie box office revenues and each key influencing factor was explored,and the Smote oversampling and ENN undersampling methods were used to solve the sample data imbalance problem and to prepare for the establishment of the prediction model.Thirdly,multi-classification logistic regression,support vector machine,artificial neural network,decision tree,random forest and XGBoost forecasting models for box office revenues of domestic films were constructed respectively,and cross-validation and grid search methods were used to optimise the parameters of the models.Fourthly,the five single prediction models with the best prediction results were integrated using the Stacking algorithm and compared with each single prediction model for analysis.Finally,the constructed Stacking integrated model is used to perform out-of-sample case prediction and the prediction results are analysed and illustrated.(4)Based on the findings and conclusions of the thesis,suggestions are made to improve the box office of Chinese films in terms of "doing a good job of pre-promotion of film products,focusing on technological innovation in film projection,creating word-of-mouth for film products,bringing into play the influence of directors and actors,and building a virtuous cycle in the film market". |