Evaluating Forecasting Models

Defining Simple Fitted Values

  • For most time-series forecasts, each observation uses all previous observations
  • Implying, each forecast for yty_{t} is based on observations y1,...,yt1y_{1}, ..., y_{t-1}
  • Thus, a forecast y^t\hat{y}_{t} should be denoted as y^tt1\hat{y}_{t|t-1}

    • However, we usually just write y^t\hat{y}_{t} because it's simpler
  • Technically, fitted values aren't always true forecasts since they're estimated using observations, including those after yty_{t} (i.e. yt+1y_{t+1})
  • For example, the fitted values of a moving average forecast will represent the average of all yy observations, where y^t=μ^\hat{y}_{t} = \hat{\mu}
  • Also, the fitted values using the drift method uses a parameter estimated on all observations of yy, where y^t=yt1+μ^\hat{y}_{t} = y_{t-1} + \hat{\mu}
  • On the other hand, naive forecasts don't include any parameters, so fitted values are true forecasts (but are less accurate usually)

Defining Residuals for Time-Series Models

  • Residuals refer to the amount we're off by when calculating a prediction y^t\hat{y}_{t} on a data value yty_{t}
  • Specifically, a residual represents et=yty^te_{t} = y_{t} - \hat{y}_{t}
  • Thus, we can use residuals to evaluate the accuracy of our predictions
  • To ensure our forecasts aren't biased, we usually enforce the following:

    • Residuals are uncorrelated
    • Residuals have 00 mean and constant variance
    • Residuals are normally distributed

Defining Notation for Evaluation Metrics

  • Before diving into specific evaluation metrics, let's define standard notation for forecasting metrics:

    • Let yty_{t} denote the current observation at time tt
    • Let yt1y_{t-1} denote the previous observation at time t1t-1
    • Let ftf_{t} denote the forecast of yty_{t}
    • Let ete_{t} denote the forecast error where et=ytfte_{t} = y_{t} - f_{t}
    • Let oto_{t} denote the one-step naive error where ot=ytyt1o_{t} = y_{t} - y_{t-1}

Evaluating Forecasting Accuracy using MAPE

  • The mean absolute percentage error (or MAPE) is defined as the following:
MAPE=100mean(etyt)MAPE = 100 * mean(\frac{|e_{t}|}{|y_{t}|})
  • The MAPE favors predictions that are smaller than its data value, which can be considered a drawback
  • On the other hand, we may want this property depending on our problem, in which case we would want to use MAPE
  • In other words, the MAPE puts a heavier penalty on forecasts that exceed the actual data values than those that are less than the actual values
  • Said another way, the MAPE puts a heavier penalty on negative errors than positive errors
  • Naturally, we would like to avoid this asymmetry of the MAPE
  • The MASE can be used if we want a more symmetrical measure of the percentage error

Evaluating Forecasting Accuracy using MASE

  • The mean absolute scaled error (or MASE) is arguably considered the best available measure of forecast accuracy
  • Before we define the MASE formula, we should define a one-step naive error
  • The one-step naive error oto_{t} refers to the error associated with guessing the previous data value as our current prediction
  • The MASE is defined as the following:
MASE=mean(et1n1i=1nytyt1)MASE = mean(\frac{|e_{t}|}{\frac{1}{n-1}\sum_{i=1}^{n}|y_{t} − y_{t-1}|})
  • Where the scaled error term refers to the following:
scalederrort=et1n1i=1nytyt1scalederror_{t} = \frac{|e_{t}|}{\frac{1}{n-1}\sum_{i=1}^{n}|y_{t} − y_{t-1}|}
  • Therefore, the MASE formula can be simplifed to the following:
MASE=mean(scalederrort)MASE = mean(scalederror_{t})
  • We can go one step further, and simplify the MASE to the following, roughly:
MASE=mean(etmean(ot))MASE = mean(\frac{|e_{t}|}{mean(|o_{t}|)})
  • Where the scaled error term roughly refers to the following:
scalederrort=etmean(ot)scalederror_{t} = \frac{|e_{t}|}{mean(|o_{t}|)}
  • Essentially, the MASE is an average of our scaled errors
  • The MASE has the following benefits:

    • Working with scaled errors, since the scaled errors are independent of the scale of the data
    • Symmetrical measure
    • Less sensitive to outliers compared to other metrics
    • Easily interpreted metric using scaled errors (compared to other metrics like RMSE)
    • Less variable on small samples
  • We can interpret scaled errors based on the following criteria:

    • A scaled error is less than one if our forecast is better than the average one-step naive forecast (i.e. using the previous data point)
    • A scaled error is greater than one if our forecast is worse than the average one-step naive forecast (i.e. using the previous data point)



Forecasting Transformations

Prophet as a GAM