| With the rapid development of China's tourism industry,tourism big data analysis attracts much attention.Tourism big data usually has many problems,such as wide source,incomplete data and unstable time,which makes data analysis very difficult.Hotel industry is an important part of the tourism industry.The occupancy rate is one of the important factors to measure the operation of a hotel.It is of great significance to use the machine learning method to predict the occupancy rate of the hotel.First,it describes and preprocesses the tourism data.The tourism data are analyzed and the data sources and descriptions are given.The characteristics and rules of the tourism data are expounded.The data are classified from the time dimension and the hotel dimension,and the available data are selected initially.Data cleaning,data transformation and data normalization are carried out,and data set fusion algorithm is designed to integrate data set to get the best data set.Then,based on tourism data,a two-level prediction model of hotel occupancy rate is proposed.The first layer model is a time based regression model(the first layer model).The first model is the regression analysis of vehicle flow,weather condition,wind,maximum air temperature,lowest temperature and air quality by using polynomial regression method.The predicted value is used as the input of the second level model as the input of the second level model.The second level model is a double-layer prediction and analysis model of hotel occupancy based on time and space(hereinafter referred to as the second level model).The second layer model integrates the prediction value of the first layer model with the inherent data set.The second layer model establishes the hotel occupancy BP neural network classification model,the hotel occupancy KNN classification model and the hotel occupancy rate random forest classification model on the integrated data based on the BP neural network,the random forest algorithm and the random forest algorithm.Finally,we set up the experimental environment to observe and analyze the experimental results.After data preprocessing,the final data set is obtained,the regression prediction model based on the time based hotel domain is realized,the result set is integrated,then the hotel occupancy prediction model based on time and space is realized,and the results are evaluated with the corresponding error parameters.Experiments show that the proposed models and methods are effective. |