Font Size: a A A

Comparison of regression and ARIMA models with neural network models to forecast the daily streamflow of White Clay Creek

Posted on:2012-09-15Degree:Ph.DType:Dissertation
University:University of DelawareCandidate:Liu, Greg QiFull Text:PDF
GTID:1459390008992700Subject:Applied Mathematics
Abstract/Summary:
Linear forecasting models have played major roles in many applications for over a century. If error terms in models are normally distributed, linear models are capable of producing the most accurate forecasting results. The central limit theorem (CLT) provides theoretical support in applying linear models.;During the last two decades, nonlinear models such as neural network models have gradually emerged as alternatives in modeling and forecasting real processes. In hydrology, neural networks have been applied to rainfall-runoff estimation as well as stream and peak flow forecasting. Successful nonlinear methods rely on the generalized central limit theorem (GCLT), which provides theoretical justifications in applying nonlinear methods to real processes in impulsive environments.;This dissertation will attempt to predict the daily stream flow of White Clay Creek by making intensive comparisons of linear and nonlinear forecasting methods. Data are modeled and forecasted by seven linear and nonlinear methods: The random walk with drift method; the ordinary least squares (OLS) regression method; the time series Autoregressive Integrated Moving Average (ARIMA) method; the feed-forward neural network (FNN) method; the recurrent neural network (RNN) method; the hybrid OLS regression and feed-forward neural network (OLS-FNN) method; and the hybrid ARIMA and recurrent neural network (ARIMA-RNN) method. The first three methods are linear methods and the remaining four are nonlinear methods. The OLS-FNN method and the ARIMA-RNN method are two completely new nonlinear methods proposed in this dissertation. These two hybrid methods have three special features that distinguish them from any existing hybrid method available in literature: (1) using the OLS or ARIMA residuals as the targets of followed neural networks; (2) training two neural networks in parallel for each hybrid method by two objective functions (the minimum mean squares error function and the minimum mean absolute error function); and (3) using two trained neural networks to obtain respective forecasting results and then combining the forecasting results by a Bayesian Model Averaging technique. Final forecasts from hybrid methods have linear components resulting from the regression method or the ARIMA method and nonlinear components resulting from feed-forward neural networks or recurrent neural networks.;Forecasting performances are evaluated by both root of mean square errors (RMSE) and mean absolute errors (MAE). Forecasting results indicate that linear methods provide the lowest RMSE forecasts when data are normally distributed and data lengths are long enough, while nonlinear methods provide a more consistent RMSE and MAE forecasts when data are non-normally distributed. Nonlinear neural network methods also provide lower RMSE and MAE forecasts than linear methods even for data that are normally distributed but with small data samples. The hybrid methods provide the most consistent RMSE and MAE forecasts for data that are non-normally distributed.;The original flow is differenced and log differenced to get two differenced series: The difference series and the log difference series. These two series are then decomposed based on stochastic process decomposition theorems to produce two, three and four variables that are used as input variables in regression models and neural network models.;By working on an increment series, either difference series or log difference series, instead of the original flow series, we get two benefits: First we have a clear time series model. The secondary benefit is from the fact that the original flow series is an autocorrelated series and an increment series is approximately an independently ditributed series. For an independently ditributed series, parameters such as Mean and Standard Deviation can be calculated easily.;The length of data during modeling is in practice very important. Model parameters and forecasts are estimated from 30 data samples (1 month), 90 data samples (3 months), 180 data samples (6 months), and 360 data samples (1 year).
Keywords/Search Tags:Models, Neural network, ARIMA, Data, Forecasting, Linear, Series, Regression
Related items