Font Size: a A A

Research On Website Traffic Prediction Based On Combination Model And Decomposition Ensemble Model

Posted on:2021-09-09Degree:MasterType:Thesis
Country:ChinaCandidate:Z JiangFull Text:PDF
GTID:2510306302974529Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
Nowadays,traffic data is a new type of data in the Internet era with the Internet industry booming.Not only its data collection level continuously improved and its scale gradually expanded,but also the demand for related data analysis and data mining is increasing.On the one hand,website traffic forecasting helps companies to timely obtain important information,i.e.,the development scale,user retention,and marketing effectiveness,and to adjust the strategic direction more quickly.On the other hand,it helps companies implement risk control,timely find out and resolve abnormal phenomena to avoid unnecessary losses.Therefore,website traffic forecasting contributes to coping with the increasingly severe competitions in the Internet industry.Its accuracy and timeliness are directly related to the efficiency and performance of enterprise management,which closely related to social reality and the needs of practice.Website traffic generally shows the typical characteristics of time series such as trend,periodicity and randomness.Therefore,we apply time series prediction methods to website traffic prediction and focuse on the accuracy of prediction results.The data used in this paper comes from the daily traffic data of a travelling guideline website.The daily traffic is defined as the number of unique visitors who visit the website within 24 hours.The time series prediction methods commonly used can be classified into traditional econometric modeling methods and prediction methods based on machine learning algorithms.Traditional econometric models are based on statistical models,with simple mathematical structures and fast fitting speeds.However,they cannot fully capture non-linear features,and have several requirements for data stability.Neural network models in machine learning algorithms are suitable for prediction of nonstationary nonlinear time series because of their flexible non-linear function fitting capabilities and little assumption on data.But its effect is easily affected by data size,data distribution,and parameter settings.Thus,it is prone to overfitting problems.The combined model is more suitable for capturing different model features of time series by synthesizing the feature capture capability of a single model,thereby improving the prediction effect of the model.The common combination methods are weighted average combination and linear combination,that is,fitting the time series as the sum of linear and non-linear components.Decomposition and integration model is also commonly used in the study of time series prediction.It first decomposes the complex original time series data into several simple modal components which are easy to describe and have a specific meaning.Then it analyzes and predicts each component separately.Finally it integrates these components to obtain the overall prediction result.This method of "decomposition before integration" can capture and mine the characteristics of time series on different scales,effectively reduce the difficulty of modeling and improving the prediction performance.However,the maximum frequency modal component obtained after decomposition often has a high sequence complexity and contains a large number of unsystematic and irregular components,which affects the overall prediction effect.First,this article explores the prediction effect between a single traditional econometric model and a neural network model.Because periodicity appears in the time series,and the residual sequence diagram and ARCH-LM test show that the residual variance is not homoscedasticity,so the SARIMA-GARCH model is selected from the traditional econometric model.Among the neural network models,the basic BP neural network model and the current mainstream LSTM neural network model are selected for fitting,and the influence of exogenous variables of holiday factors is considered based on the BP model.Model results show that the prediction effect of a single model is greatly affected by the characteristics of the data itself.The traditional econometric model is easier to capture periodic time-varying features,and has better prediction effect on first test dataset which presents stable small-scale periodic fluctuations;while the neural network models are easier to capture abrupt non-linear features,having better prediction effect on second test dataset which has significant traffic fluctuations during the National Day holiday.The prediction effect of the BP model considering exogenous variables of holidays is not better than the original model,and the prediction effect is unstable,which could be also related to the construction method of exogenous variables.Second,this article explores the prediction effect between a single model and a combination model under different combination methods.The combination methods considered include three weighted average combinations with different weights settings,linear combinations,and non-linear combinations.The linear combination method fits the the linear component based on the traditional econometric model and the non-linear component based on the neural network model.The nonlinear combination method fits the relationship between the linear and the non-linear components with a non-linear function,thus improves the assumption of the additivity.The model results show that,on the one hand,the nonlinear combination model has significantly improved the prediction accuracy compared to the linear combination model,which indicates that the additive assumption limits the prediction accuracy of the linear combination model.On the other hand,the prediction accuracy of the combination models is not necessarily better than the single model.However,its advantage is that it can improve the stability of the forecast while guaranteeing a certain forecast accuracy,and it is not easy to be affected by the statistical characteristics of the data itself and cause large fluctuations in the forecast effect.Third,this article explores the predictive effect between a single model and a decomposition and integration model.Decomposition and integration models considered here are empirical modal decomposition model EMD and empirical modal decomposition model EEMD.Each component is modeled based on traditional econometric models and two neural network models.The model results show that,compared with the single model,the decomposed and integrated model has significantly improved both of the prediction accuracy and direction accuracy.This proves the operability and predictive effect of the method of decomposition and integration model being applied in the field of Internet traffic prediction.Finally,this article explores the prediction effect between the improved decomposition and integration model and the unimproved model.For the instinct mode function IMF1 with higher sequence complexity after EEMD decomposition,two processing methods of wavelet denoising and VMD secondary decomposition are considered,and EEMD-IMF1-WT and EEMD-IMF1-VMD improved models are established.The model results show that,the EEMD-IMF1-WT model does not performs better because it is influenced by parameter settings and data characteristics,while the EEMD-IMF1-VMD model significantly improved the prediction accuracy compared to the unimproved model.This model comprehensively performs best in this paper.It shows that the method of VMD secondary decomposition for IMF1 with higher complexity and more irregular components can effectively improve the prediction effect of the model.
Keywords/Search Tags:website traffic prediction, traditional econometric model, neural network model, combination model, decomposition and integration model
PDF Full Text Request
Related items