Operations Research Models and Methods Paul A. Jensen and Jonathan F. Bard Time Series and Forecasting.S1 Time Series Models An example of a time series for 25 periods is plotted in Fig. 1 from the numerical data in the table below. The data might represent the weekly demand for some product. We use x to indicate an observation and t to represent the index of the time period. For the case of weekly demand the time period is measured in weeks. The observed demand for time t is specifically designated x t. The lines connecting the observations on the figure are provided only to clarify the picture and otherwise have no meaning. Time Observations 1-10 4 16 12 25 13 12 4 8 9 14 11-20 3 14 14 20 7 9 6 11 3 11 20-25 8 7 2 8 8 10 7 16 9 4 25 20 x 15 10 5 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 Time Period, t Figure 1. A time series of weekly demand Mathematical Model Our goal is to determine a model that explains the observed data and allows extrapolation into the future to provide a forecast. The simplest model 9/10/01 Inventory Theory
Time Series and Forecasting 2 suggests that the time series is a constant with variations about the constant value determined by a random variable t. X t = b + t. (1) The upper case X t represents the random variable that is the unknown demand at time t, while the lower case x t is a value that has actually been observed. The random variation about the mean value called the noise, t. The noise is assumed to have a mean value of zero, a given variance, and the variations in two different time periods are independent. Specifically E( t ) = 0, Var( t ) = 2, E( t w ) = 0 for t w. A more complex model includes a linear trend for the data. X t = b 0 + b 1 t + t. (2) Of course (1) and (2) are special cases of a polynomial model. X t = b 0 + b 1 t + b 2 t 2 + + b n t n + t. A model for a seasonal variation might include transcendental functions. The cycle of the model below is 4. The model might be used to represent data for the four seasons of the year. X t = b 0 + b 1 sin 2 t 4 + b 1 cos 2 t 4 + t. In every model considered here, the time series is a function only of time and the parameters of the models. We can write X t = f(b 0, b 1, b 2,,b n, t) + t. Since for any given time the value of f is a constant and the expected value of t is zero, E(X t ) = f(b 0, b 1, b 2,,b n, t) and Var(X t ) =V( t ) = 2. The model supposes that there are two components of variability for the time series; the mean value varies with time, and the difference from the mean varies randomly. Time is the only factor affecting the mean value, while all other factors are described by the noise component. Of course, these assumptions may not in fact be true, but this chapter is devoted to cases that can be abstracted to this simple form with reasonable accuracy. One of the problems of time series analysis is to find the best form of the model for a particular situation. In this introductory discussion we are primarily concerned about the simple constant or trend models. We leave the problem of choosing the best model to a more advanced text. In the following paragraphs we describe methods for fitting the model, forecasting from the model, measuring the accuracy of the forecast and
Time Series and Forecasting 3 forecasting ranges. We illustrate the discussion of this section the moving average forecasting method. Several other methods are described later in the chapter. Fitting Parameters of the Model Once a model is selected and data is available, it is the job of the statistician to estimate its parameters, to find parameter values that best fit the historical data. We can only hope that the resulting model will provide good predictions of future observations. Statisticians usually assume all values in a given sample are equally valid. For time series however, most methods recognize that data from recent times are more accurate than data from times well in the past. Influences governing the data probably change with time and a method should have the capability of neglecting old data while favoring new. A model estimate should be able to change over time to reflect changing conditions. In the following the time series model includes one or more parameters. We identify the estimated values of these parameters with hats on the parameter notation. For instance ^b 1, ^b 2,, ^b n. The procedures also provide estimates of the standard deviation of the noise. Again the estimate is indicated with a hat, ^ We will see that there are several approaches available for estimating e. To illustrate these concepts consider the data in Table 1. Say that the statistician has just observed the demand in period 20. She also has available the demands for periods 1 through 19. She cannot know the future, so the information shown as 21 through 25 is not available. The statistician thinks that the factors that influence demand are changing very slowly, if at all, and proposes the simple constant model for the demand X t = b + t. (1) With the assumed model, the values of demand are random variables drawn from a population with mean value b. The best estimator of b is the average of the observed data. Using all 20 points the estimate is 20 ^b = x t /20 = 11.3. t=1 This is the best estimate for the 20 data points, however, we note that x 1 is given the same weight as x 20 in the computation. If we think that the model is actually changing over time, perhaps it is better to use a method that
Time Series and Forecasting 4 gives less weight to old data and more to the new. One possibility is to include only later data in the estimate. Using the last ten observations and the last five we obtain 20 ^b = x t /10 = 11.2 and ^b 20 = x t /5 = 9.4. t=10 t=15 The latter two estimates are called moving averages. Which is the better estimate for the application? We really can't tell at this point. The estimator that uses all data points will certainly be the best if the time series follows the assumed model, however, if the model is only approximate and the situation is actually changing, perhaps the estimator with only five data points is better. In general, the moving average estimator is the average of the last m observations. t ^b = x i /m, i=k where k = t - m + 1. The quantity m is the time range and is the parameter of the method. Forecasting from the Model The purpose of modeling a time series is usually to make forecasts of the future. The forecasts are used directly for making decisions such as ordering replenishments for an inventory or staffing workers for production. They might also be used as part of a mathematical model for a more complex decision analysis. The current time is T, and the data for the actual demands for times 1 through T are known. Say we are attempting to forecast the demand at time T +. The unknown demand is the random variable X T+, and its ultimate realization is x T+. Our forecast of the realization is ^x T+. Of course the best that we can hope to do is estimate the mean value of X T+. Even if the time series actually follows the assumed model, the future value of the noise is unknowable. Assuming the model is correct X T+ = E(X T+ ) + t where E(X T+ ) = f(b 0, b 1, b 2,,b n, T+ ).
Time Series and Forecasting 5 When we estimate the parameters from the data for times 1 through T, we have an estimate of the expected value for the random variable as a function of.. This is our forecast. ^x T+ = f( ^b 0, ^b 1, ^b 2,, ^b n, T+ ). Using a specific value of in this formula provides the forecast for time T+. When we look at the last T observations as only one of the possible time series that could have been obtained from the model, the forecast is a random variable. We should be able to describe the probability distribution of the random variable, including its mean and variance. For the moving average example, the statistician adopts the model X t = b + t. Assuming T is 20 and using the moving average with ten periods, the estimated parameter is ^b = 11.2. Since this model has a constant expected value over time, the forecast is the same for all future periods. ^x T+ = ^b = 11.2 for. = 1, 2, Assuming the model is correct, the forecast is the average of m observations all with the same mean and standard deviation,. Since the noise is Normally distributed, the forecast is also Normally distributed with mean b and standard deviation m Measuring the Accuracy of the Forecast The error in a forecast is the difference between the realization and the forecast, Assuming the model is correct, e = x T+ ^x T+. e = E(X T+ ) + t ^x T+. We investigate the probability distribution of the error by computing its mean and variance. One desirable characteristic of the forecast ^x T+ is that it be unbiased. For an unbiased estimate, the expected value of the forecast is the
Time Series and Forecasting 6 same as the expected value of the time series. Since t is assumed to have a mean of zero, then for an unbiased forecast, E(e ) = 0. Because the noise at any give time is independent of the noise at any other time, the variance of the error is Var(e ) = Var[E(X T+ ) ^x T+ ] + Var( T+ ) 2 ( ) = Ε 2 ( ) + 2. The variance of the error has two parts, that due to the variance in the estimate of the mean, Ε 2 ( ), and that due to the variance of the noise, 2. Due to the inherent inaccuracy of the statistical methods used to estimate the model parameters and the possibility that the model is not exactly correct, the variance in the estimate of the means is an increasing function of. For the example of the moving average, 2 ( ) = 2 m + 2 = 2 [1 + (1/m)]. The error variance is a function of m and decreases as m increases. Obviously the smallest error comes when m is as large as possible, if the model is correct. Unfortunately, we cannot be sure that the model is correct, and we set m to smaller values to reduce the error due to errors in the model. Using the same forecasting method over a number of periods allows the analyst to compute measures of quality for the forecast for given values of.. The forecast error, e t, is the difference between the forecast and the observed value. For time t, e t = x t ^x t. Table 2 shows a series of forecasts for periods 11 through 20 using the data from Table 1. The forecasts are obtained with a moving average using m equal to 10 and. equal to 1. We make a forecast at time t with the calculation t ^x t+1 = x t /10. k=t 9 Although in practice one might round the result to an integer, we keep fractions here to observe better statistical properties. The error of the forecast is the difference between the forecast and the observation.
Time Series and Forecasting 7 Table 2. Forecast Error for a Moving Average Time 11 12 13 14 15 16 17 18 19 20 Observation 3 14 14 20 7 9 6 11 3 11 Forecast 11.7 11.6 11.4 11.6 11.1 10.5 10.2 10.4 10.7 10.1 Error -8.7 2.4 2.6 8.4-4.1-1.5-4.2 0.6-7.7 0.9 One common measure of forecasting error is the mean absolute deviation, MAD. n e i i=1 MAD = n where n error observations are used to compute the mean. The sample standard deviation of error is also a useful measure n (e i e) 2 n e i 2 n( e) 2 i=1 i=1 s e = n p = n p where e is the average error, and p is the number of parameters estimated for the model. As n grows the MAD provides a reasonable estimate of the sample standard deviation s e 1.25 MAD From the example data we compute the MAD for the ten observations. MAD = (8.7 + 2.4 + + 0.9)/10 = 4.11. To compute the sample error standard deviation. e = ( 8.7 + 2.4 0.9)/10 = -1.13. s e 2 = ( 8.72 + 2.4 2 0.9 2 ) 10(-1.13) 2 9 ) = 27.02 s e = 5.198. We see that 1.25(MAD) = 5.138 is approximately equal to the sample standard deviation. Since it is easier to compute the MAD, this measure is used in our examples. 1 1 The time series used as an example is simulated with a constant mean. Deviations from the mean are Normally distributed with mean zero and standard deviation 5. One would expect an error standard deviation of 5( 1+1/10 ) = 5.244. The observed statistics are not far from this value. Of course, a different realization of the simulation will yield different statistical values.
Time Series and Forecasting 8 variance, The value of s e 2 for a given value of. is an estimate of the error 2 ( ). It includes the combined effects of errors in the model and the noise. If one assumes that the random noise comes from a Normal Distribution, an interval estimate of the forecast can be computed using the Students t distribution. ^x T+.± t α/2 s e ( ) where t α/2 is found in a Students t distribution table with n p degrees of freedom.