| Sports consumption is an important part of the national economy.Among all eight categories of residents’ consumption,except residential consumption,the other seven categories include sports consumption.Sports consumption plays an increasingly important role in improving people’s living standards,reducing living pressure,stimulating domestic demand and promoting consumption.It is gradually becoming an important force in economic structure adjustment,modernization and industrial upgrading.As a part of the commodity retail price index,the retail price index of sports entertainment products is not only an important index to measure the price change of sports products,but also an important reference index for the government to monitor the development of the sports industry and make decisions.Its adjustment and change are related to the formulation and implementation of relevant regulatory measures of the sports industry.Due to the obvious time lag in the release of sports entertainment retail price index,the index has obvious time lag in reflecting the operation of sports industry.At the same time,there are many factors affecting the change of the retail price index of sports and entertainment products,which leads to the complex nonlinear characteristics of the change of the index.The traditional econometric model and time series model are difficult to fit effectively.With the popularity of computers and mobile phones,the development of the Internet.The way people obtain information is changing from traditional media channels to network channels.As the gate of the Internet,search engine is an important entrance for people to obtain information.When market subjects obtain relevant information through search engines,search engines also record their query records and form network search data.These data can be used as a quantitative indicator of consumers’ and producers’ attention and provide scientific and reasonable data for the research of many problems.Therefore,this thesis uses network search data combined with feature subset extraction and selection method and machine learning algorithm to predict the retail price index of sports and entertainment products,so as to improve the prediction accuracy and timeliness of the model.This thesis is mainly divided into five parts.The first part,starting from the equilibrium price theory,analyzes the main influencing factors of the fluctuation of the retail price index of sports and entertainment products,expounds the relationship between the price of sports and entertainment products and Internet information search from a theoretical point of view,and analyzes the prediction ability of online search data on the retail price index of sports and entertainment products.The second part,based on the theoretical analysis,combined with the compilation method of commodity retail price index,the types of sporting goods and relevant literature,set the initial key thesaurus,and expand a total of 236 keywords by means of secondary search.In the third part,use Python to crawl the data.The leading keywords with high correlation with the retail price index of sports and entertainment products are selected by time difference correlation analysis.In the fourth part,PCA,stepwise regression,lasso and recursive feature elimination methods are used for feature extraction and selection of data.In the fifth part,15 groups of prediction models are established by combining the five groups of feature subsets selected above with three machine learning algorithms: regression random forest,BP neural network and support vector regression.The traditional time series ARIMA model is introduced to compare and analyze the advantages and disadvantages of 16 groups of different prediction results.The results show that in terms of feature subset,the feature subset selected by recursive feature elimination algorithm(RFE-SVR)based on support vector regression performs best in the three prediction models.In terms of prediction model,BP neural network prediction based on RFE-SVR improves RMSE by 0.3% and MAPE by 29.3% compared with traditional time series ARIMA model.Compared with the prediction results of SVR and RF optimal feature subset,RMSE and MAPE are improved by 0.02 and 0.15% and 6% and 18.7% respectively.The research shows that compared with the traditional time series model,the prediction model constructed by machine learning algorithm is generally better.In the machine learning model,BP neural network prediction model also has better prediction accuracy and stability than support vector regression and random forest.In terms of model fitting,the goodness of fit r square of BP neural network prediction model based on RFE-SVR is 0.944,which better fits the trend of original time series and is better than the results of other models.In terms of inflection point prediction,the prediction value based on the optimal model can successfully capture the inflection point in the time trend of the retail price index of sports and entertainment products.At the same time,the predicted value is about a month ahead of the official data,which provides a reference for timely monitoring the change of retail price index of sports and entertainment products and industrial policy adjustment. |