Issue 4 June 2006 FORESIGHT The International Journal of Applied Forecasting THINK DEVISE CALCULATE ENVISION INVENT INSPIRE THINK DEVISE CALCULATE ENVISION INV SPECIAL FEATURES Forecasting for Call Centers Forecast Accuracy Metrics for Inventory Control Lessons From Successful Companies Breaking Down Barriers to Forecast Process Improvement Transformation Lessons From Coca-Cola Enterprises Inc. A PUBLICAT I O N O F T H E I N T E R N AT I O N A L I N S T I T U T E O F F O R E C A S T E R S IIF ACCURACY AND ACCURACY-IMPLICATION METRICS FOR INTERMITTENT DEMAND by John Boylan and Aris Syntetos Preview: John and Aris distinguish between forecast-accuracy metrics, which measure the errors resulting from a forecast method, and accuracy-implication metrics, which measure the achievement of the organization’s stock-holding and service-level goals. Both measurements are important. The correct choice of a forecast-accuracy metric depends on the organization’s inventory rules and on whether accuracy is to be gauged for a single item or across a range of items. The authors recommend specific accuracy and accuracy-implication metrics for each context. John Boylan is Professor of Management Science at Buckinghamshire Chilterns University College. Previously, he worked in OR at Rolls-Royce and at the Unipart Group. His research and publications (Journal of the OR Society, International Journal of Production Economics, International Journal of Forecasting) have increasingly focused on the challenges of forecasting slow, intermittent and lumpy demands. Aris Syntetos is a Lecturer in Operations Management and Operational Research at the University of Salford, UK. His research interests include intermittent-demand forecasting and the interface between forecasting and stock control. On behalf of the Salford Business School, he is currently involved in two inventory-management projects, one with an engineering firm and one with an international wholesaling company. In considering forecasting-accuracy metrics for intermittent demand, we should begin by looking at the inventory method. Depending on that method, we may need estimates of mean demand, variance of demand, percentiles of demand, and probabilities of high-demand values. When a forecast of mean demand is needed, the accuracy of the forecast for an individual item can be judged by the mean absolute error (MAE). To assess forecast accuracy across a range of items, a scale-independent metric, such as the ratio of the mean absolute error to the mean demand, is appropriate. Alternatively, the geometric mean absolute error (GMAE) may be used. If forecasts of percentiles of demand or probabilities of high-demand values are needed, then an appropriate chi-square test should be used, concentrating on the upper end of the distribution (for example, the 95th percentile). No matter which inventory system is used, the accuracy-implication metrics of stock-holding and service levels should always be considered. Introduction In the February 2006 edition of Foresight, Kenneth Kahn poses the following question: “Should we view forecast accuracy as an end in itself or rather as a means to an end?” (Kahn, 2006, p. 25). Most commonly, intermittent-demand forecasting is a means toward the twin ends of lowering stock-holding costs (including costs of stock obsolescence) and maintaining or improving stock availability (“service level”). The achievement of these goals depends not only on the accuracy of the forecasting method but also on the suitability of the inventory rules determining the timing and size of orders. The relationship between these factors and the system’s goals is shown in Figure 1, next page. If we regard the design and implementation of a stockmanagement system as a means toward an end, then the outcome measures on the right-hand side of Figure 1 should not be ignored. These measures ensure that forecasters and inventory managers do not lose sight of the system’s purpose. Accuracy-Implication Metrics Most managers would regard stock-holding costs and service level as outcome measures rather than accuracy measures. But if we keep the inventory rules fixed and try different forecasting methods, these outcome measures become accuracy-implication measures. The term “accuracy implication” is used instead of “accuracy” because metrics such as service level do not measure the accuracy of a forecasting method, but they do measure the implication of its accuracy under a given inventory rule. A good example of this approach is the study by Eaves and Kingsman (2004). They estimate the effect of forecastingmethod choice on 18,750 line items, including intermittent June 2006 Issue 4 FORESIGHT 39 Figure 1. Relationship Among Forecasting, Inventory Rules and Performance Measures Forecasting Method STOCK MANAGEMENT SYSTEM Inventory Rules Stock-holding Costs Service Level items. Importantly, they assume a constant service-level requirement, allowing accuracy implications to be assessed. For example, for quarterly data, they find that using single exponential smoothing instead of Croston’s method requires an additional stock investment of £1.28m. Therefore, instead of simply reporting that Croston’s method is more accurate than smoothing, they show the cost implication of making the wrong choice. What Types of Forecasts Are Required? To identify the most appropriate accuracy metrics, we must first ask what is to be forecast. The variables to be forecast depend on the inventory method. For example, suppose that we use a periodic (R, s, S) inventory rule. This means that we review the inventory system every R periods, and when the stock level drops to a certain reorder point (called s) or lower, then we order enough stock to take us back up to the reorder level (called S). Some of the most effective methods for finding s and S, including Naddor’s heuristic (Naddor, 1975), require only estimates of the mean and variance of demand. To take a second example, suppose that we use an (s, Q) inventory method. In this case, we review the stock continuously and place an order of fixed quantity Q if the stock drops to the reorder level s or below. This system is also known as (r, Q). In some systems, we wish to ensure that there is no more than a 10% chance of stockout during the replenishment cycle (review time plus lead time). For this case, we need to estimate the 90th percentile of the distribution of demand over the replenishment cycle, rather than the mean and the variance. Another alternative is that we wish to ensure that at least 90% of demand is satisfied directly off the shelf—please note that this is not the same as a 90% chance of no stock outs; this point is discussed in greater detail by Silver, Pyke, & Peterson (1998, 266-270). In this case, we need to estimate the probabilities of any demands that exceed the reorder level. Here, instead of estimates of percentiles, we want estimates of the probabilities of high demand. 40 FORESIGHT Issue 4 June 2006 Should We Forecast the Entire Demand Distribution? Willemain (this issue) suggests that the general problem is to forecast the whole distribution of demand. It is true that this is the most general statement of the problem. However, as we have already noted, some inventory systems require estimates of only the mean and variance. For other systems, estimates of high percentiles and probabilities of high-demand values are needed; even in these cases, we do not need a forecast of the entire distribution. Measures based on the entire distribution can be misleading. A good overall “goodness of fit” statistic may result from excellent forecasts of the chances of lowdemand values, which can mask poor forecasts of the chances of high-demand values. It may be that for other applications (for example, revenue forecasts), forecasts of low percentiles are required (Willemain et al., 2004). However, for inventory calculations, we suggest that attention be restricted to the upper end of the distribution (the 90th or 95th percentiles). In summary, percentile forecasts and estimates of probabilities of demand are required for some inventory systems. For other systems, we need forecasts of the mean and variance of demand. All these quantities are features of the overall distribution of demand. It is the accuracy of determining the key quantities (for example, mean demand, 90th percentile) required for the inventory rules that is important, rather than the accuracy across the entire demand distribution. Estimates of Mean Demand When it is necessary to forecast the mean demand level, there are two issues to address: What is the best forecasting method for a particular stock-keeping unit (SKU)? What is the best forecasting method across a range of SKUs? The second problem is more common in practice, but answering the first question gives us some insight into how to answer the second. The case of a single SKU For a single SKU, we may use a simple measure such as the mean absolute error (MAE) to measure a method’s accuracy in forecasting mean demand. (The mean absolute error is calculated by noting each of the errors and treating them all as positive in sign, and then averaging them.) The mean squared error is not suitable for intermittent- demand items because it is sensitive to the occurrence of very high forecast errors. The accuracy of a method’s mean-demand forecasts can be compared with another method by calculating the percentage of series for which it has a lower MAE. This approach is known as the Percentage Better method, which is discussed in more detail by Boylan (2005). The approach can be easily extended to the comparison of more than two methods; in that case, it would be termed Percentage Best. A limitation of the Percentage Better method is that, although it summarizes the frequency with which one method outperforms another, it does not inform the user of the degree of improvement in accuracy. Averaging the values of mean absolute error across series would seem to be the obvious answer. Unfortunately, such measures can be dominated by a small number of SKUs with large errors. This problem is known as scale dependence. For non-intermittent data, an effective way of addressing the scale-dependence problem is to calculate the mean absolute percentage error (MAPE). However, the MAPE measure fails for intermittent data because the denominator (actual value) is frequently zero. Amending the denominator to unity when the actual value is zero, as suggested by Jim Hoover in this section, is a pragmatic idea, but it is without any foundation in statistical theory. Another option mentioned by Hoover is the symmetric MAPE (sMAPE), in which the numerator is the absolute value of the actual minus forecast, and the denominator is the average of the actual and forecast values. However, whenever the actual value is zero, the sMAPE entry will have a value of two, regardless of the forecast. If the actual is zero and our forecast is 1, then sMAPE = 1 / ((0+1) / 2)) = 2. If our forecast is 100, then sMAPE = 100 / ((0+100) / 2)) = 2. Therefore, sMAPE cannot be recommended because when actual demand is zero, it does not discriminate between forecasting methods. Scale-independent metrics Scale-independent metrics are required to assess forecast accuracy across a range of items. For intermittent data, a good scale-independent measure is the ratio of the mean absolute error to the mean demand, as suggested by Jim Hoover. A variation on this approach is to compare the accuracy of one method to another by taking the ratio of mean absolute errors. Alternatively, instead of MAEs, we can compute the ratio of the geometric root mean square error (GRMSE) of one method to that of another. Although this metric is more complex, it is even more robust (less sensitive) than the MAE regarding outlying observations. Fildes (1992) showed that, in the GRMSE calculation, the distorting effect of large errors cancels out. For details on the application of the GRMSE to intermittent-demand items, see Syntetos and Boylan (2005). In his accompanying article, Rob Hyndman correctly notes that the geometric root mean square error is identical to the geometric mean absolute error (GMAE). Because the GMAE is easier to calculate than the GRMSE, and delivers the same result, we will use it in the example that follows. The geometric mean is an alternative to averaging by the arithmetic mean, in which we multiply all observations and then find the nth root. For example, suppose we have three observations: 1, 4, and 16. The geometric mean is 4, as this number is the cube root of 64 (= 1 x 4 x 16). This approach can also be applied to the absolute forecast errors. Suppose we have four forecast errors: -3, 1, -5, and 4. The absolute errors are 3, 1, 5, and 4. Then the geometric mean is the fourth root of 60 (= 3 x 1 x 5 x 4), namely 2.783. This is the geometric mean absolute error, and it is identical to the geometric root mean square error. A potential problem with the GMAE is that if any one forecast error is zero, then the GMAE is also zero, regardless of the size of the other forecast errors. Zero forecast errors can arise in two ways for intermittent demand: 1. Non-zero demand: identical non-zero is forecast. 2. Zero actual demand: zero is forecast. In our experience, the first case does not arise frequently in practice, and it never occurred on the dataset of 3,000 SKUs analyzed by Syntetos and Boylan (2005). Methods based on exponential smoothing (ES), such as Croston’s method and the Syntetos-Boylan approximation, do not generally produce whole-number estimates of the mean demand; therefore they do not typically generate zero errors. Consequently, series with zero GMAEs will be rare if ES-based methods are compared, and they can be excluded from an across-series analysis. If other methods are used, such as the naïve method, then the GMAE will not always be well defined. However, the naïve method is sensitive to large demands and will generate high forecasts in such instances, making it inappropriate for practical inventory applications. June 2006 Issue 4 FORESIGHT 41 The second case, highlighted to us in a private e-mail correspondence by Jack Hayya, can occur when it has been a very long time since there have been any non-zero observations. This may signal that the item is at the end of its life and should be reviewed for classification as “obsolescent,” requiring no subsequent forecasts. If the item is nearing obsolescence (but is not yet obsolete), there would have been some evidence of demand in recent years, and a zero mean demand forecast is inappropriate and should be reviewed. Estimates of Demand Variance Why does the variance of forecast error need to be estimated? There are two reasons: (1) in some cases, the variance of demand is estimated as an intermediate step in finding a percentile of forecast demand, or the probability of high values of demand; (2) in other cases, the variance of demand is input to a formula that will be used to estimate inventory parameters, such as the reorder point (s). It is not possible to assess the accuracy of variance estimates directly, unless assumptions are made about the demand distribution. However, indirect approaches are available. If the variance is estimated to find a percentile of demand, we can examine the accuracy of the resulting percentile estimate. To do this, we identify the percentile of interest (for example, the 90th percentile) and compare how many observations exceed the percentile estimate against the expected value. This can be achieved using the chi-square test, as discussed by Tom Willemain. A similar approach can be adopted if the variance estimate is used to calculate probabilities of high values of demand. If the variance is used as an input to an inventory formula, we can look at the measures of inventory cost and service. This would enable different approaches to variance estimation to be compared indirectly and is an example of the accuracy-implication approach advocated in this paper. Conclusions In considering forecasting-accuracy metrics for intermittent demand, we should begin by looking at the inventory method. Which forecasts are required for the particular inventory method? The answer may be estimates of mean demand, variance of demand, percentiles of demand, or probabilities of high-demand values. There are appropriate accuracy metrics for each type of estimate. No matter which inventory system is in use, the accuracyimplication metrics of stock-holding costs and service 42 FORESIGHT Issue 4 June 2006 levels should always be considered because these are of prime importance to the organization. The use of these measures should not be limited to situations in which it is difficult to assess forecast error directly. Accuracyimplication metrics also offer a basis for the comparison of different forecasting methods. References Boylan, J. (2005). Intermittent and lumpy demand: A forecasting challenge, Foresight: The International Journal of Applied Forecasting, Issue 1, 36-42. Eaves, A. H. C. & Kingsman, B. G. (2004). Forecasting for the ordering and stock-holding of spare parts, Journal of the Operational Research Society, 55, 431-437. Fildes, R. (1992). The evaluation of extrapolative forecasting methods, International Journal of Forecasting, 8, 91-98. Kahn, K. B. (2006). Commentary: Putting forecast accuracy into perspective, Foresight: The International Journal of Applied Forecasting, Issue 3, 25-26. Naddor, E. (1975). Optimal and heuristic decisions on single and multi-item inventory systems, Management Science, 21, 1234-1249. Silver, E. A., Pyke, D. F. & Peterson, R. (1998). Inventory Management and Production Planning and Scheduling, 3rd ed., New York: John Wiley & Sons. Syntetos, A. A. & Boylan, J. E. (2005). The accuracy of intermittent demand estimates, International Journal of Forecasting, 21, 303-314. Willemain, T. R., Smart, C. N. & Schwarz, H. F. (2004). A new approach to forecasting intermittent demand for service parts inventories, International Journal of Forecasting, 20, 375-387. Contact Info: John Boylan Buckinghamshire Chilterns University College, UK [email protected] Aris Syntetos University of Salford, UK [email protected]
© Copyright 2026 Paperzz