Demand Forecasting in Practice

Retailers love to sell as many goods as possible, but they also fear high holding costs for unsold inventory. By leveraging demand forecasting, one can stock items just enough for the demand.

Maintain lowest inventory level to meet the expected customer service level. (Seaman|WalmartLabs)

where the expected customer service level comes demand forecasting.

Service providers (e.g., food delivery, taxi, call center) have similar needs. Instacart scheduled drivers based on forecasted demand, so that customers can get their grocery delivered, and shoppers can be well-utilized (Putrevu|Instacart, Deng|Instacart).

Similarly, Uber directed drivers to areas with high forecasted demand so that both drivers and customer benefit (Bell|Uber).

In a nutshell

Demand forecasting is commonly applied for the purpose of sustainable growth.

Evaluating Forecasting Models

Let’s first discuss how to measure a forecasting model.

If you can’t measure It, you can’t improve It.

Denote actual and forecasted demand as \(x_t\) and \(\hat{x}_t\). The error at time \(t\) is \(e_t = x_t - \hat{x}_t\).

Forecasting bias indicates whether a model consistently over- or under-forecast.

\[ \sum_{t} e_t \] Mean absolute error measures how forecast differs from actual, ignoring whether it is under- or over-forecasting. \[ \sum_{t} |e_t| \] The root mean squared error is a similar measure also in the same units as the demand. \[ \sqrt{ \frac{\sum_{t} e_t^2} {n} } \] Note the root mean squared error penalizes more for larger errors.

One often re-runs forecasting as new data become available. Forecasting votalility, such the standard deviation between different runs, measures forecast changes between different runs. High votality of demand forecasts can result in overstock (Seaman|WalmartLabs).

Say, forecast changes from 100 to 80. If one stocks 100 units according to the first forecast, then there are 20 units over-stocked according to the new forecast.

One should also consider business metrics such as lost sales and over-supply cost to measure the impact.

What really matters is how much does demand forecasting improve the business

Note factors other than forecasting could also impact business metrics. For example, even if an item is under-forecasted and under-stocked, customers may choose a similar item, which is not a lost sale.

Modeling

There are mainly two types of models for demand forecasting: time series and feature-based methods.

Time series models

Time series models such as ARIMA (Auto-Regressive Integrated Moving Average) and Holt-Winters are based purely on the past demand values.

Here we illustrate what ARIMA(\(p\),\(d\),\(q\)) means

\(\boldsymbol d\) is the number of differences needed to make the series stationary (constant mean, variance and autocorrelation).

when \(d = 0, y_t = x_t\); when \(d = 1, y_t = x_t - x_{t-1}\).

\(\boldsymbol p\) is the number of auto-regressive (AR) terms (future demand regressed on past demand), and \(\boldsymbol q\) is the number of lagged forecast error (MA) terms (future demand regressed on past forecasting errors). So ARIMA(\(p\),\(d\),\(q\)) becomes

\[\hat{y}_t = u + a_1 y_{t-1} + ... + a_p y_{t-p} + b_1 e_{t-1} + ... + b_q e_{t-q}\] where \(u\) is a constant.

A seasonal ARIMA model, ARIMA(\(p\),\(d\),\(q\))(\(P\),\(D\),\(Q\)), is a generalized version considering seasonality. \(D\) is the number of seasonable differences, \(P\) is the number of seasonal AR terms, and \(Q\) is the number of MA terms.

The following graph shows a seasonable ARIMA(0,1,1)(0,1,1) applied to forecast an airline’s monthly demand (forecast R package) after 1956. The black line is the actual demand, blue line is the predicted demand, and the shade represents prediction uncertainty.

Feature-based models

While time-series models consider only past demand, feature-based models, typically supervised learning models, can model exogenous factors such as weather, traffic. These models can have an advantage given sufficient data and relevant exogenous variables - even the time seseries model output can be included as a variable.

When the number of exogenous variables are large with complex interations, complex models can be beneficial (Bell|Uber). Recurrent neural networks gained significant accuracy improvement for Uber demand forecasting during extreme times such as New Year’s Eve (Laptev|Uber).

It should be noted that one can start forecasting demand with time-series models, and further explore more complex approaches when accuracy is too low.

In practice. classical statistical algorithms [time-series models] tend to be much quicker and easier-to-use (Bell|Uber).

Validation

Regular cross-validation can under-estimate the forecasting error as the data points in a time series are highly correlated. For a given specific time, one can train a model from the data before the time, and validate it on the data afterwards. The forecasting errors at different timestamps are collected and averaged to get the validation error.

The graph below shows two ways validations (code was adapted from Dr. Rob Hyndman). The left graph uses a fixed window of past data for training, while the right one uses all the past data for training. Both forecast the next one data point.

When there are not enough data for validation, one can use information criteria such as the AIC or the BIC to evaluate the model [Hannachi|Nordstrom].

Miscellaneous

Handling sparse data

When forecasting demand for a large number of items (common for retailers), data can be missing (data tracking issue or new items) and truncated (e.g., the demand of an out-of-stock item). In this case, it is useful to cluster similar items together to generate sufficient data for modeling. Ideally, a similarity metric less sensitive to scale and outliers is used for clustering, e.g., Spearman correlation. Text information about the items may also be leverged for clustering (Seaman|WalmartLabs).

Timescale

Based on different timescales, forecasting can be categorized into short-term, medium-term and long-term.

In the electricity area, short-term forecasts (minutes to days ahead) can be used for day-to-day operations such as electrical generators coordination (storing electricity is difficult), medium-term (days to months ahead) for derivatives pricing, and long-term (months to years ahead) for future site investment decisions (Energy forecasting|Wikipedia).

For retailers, short-term forecasts can be used for pricing to account for the gap between inventory and demand, while longer-term forecasts can be used for replenishment as it takes days/months for orders to arrive (Seaman|WalmartLabs).

Forecasting for different timescales should also be handled differently. Short-term forecasts most rely on recent demand data, and therefore retaining a model including the most recent data points can make a large impact [Hannachi|Nordstrom and Hong|UNC]. Medium- and long-term forecasts could benefit from the inclusion of trend, seasonality and external factors such as weather, and even economic factors (Hong|UNC).

Demand elasticity

When there is high variance in demand, it is hard to meet the demand with a low risk of oversupply. In some cases, supply simply can not meet the demand during peak times. When customers are flexible about the time being served, or the items to be purchased, considering elastic demand can smooth the demand and make it easier to arrange supply (Deng|Instacart).

Interpretability

In a business context, especially when multiple stakeholders are involved, one may prefer an interpretable forecasting model, that is easy to explain and diagnose (Nantes|Blue Apron).

Forecasting is the art of saying what will happen, and then explaining why it didn’t!

References

Forecasting at Uber: An Introduction. Franziska Bell and Slawek Smyl | Uber | 2018
Leveraging Elastic Demand for Forecasting. Houtao Deng, Ganesh Krishnan, Ji Chen, Dong Liang | Instacart| 2018
No order left behind; no shopper left idle. Jagannath Putrevu | Instacart | 2017
Retail Sales Forecasting at Walmart. Brian Seaman WalmartLabs | 2017
Energy forecasting. Wikipedia
Forecasting Demand at Blue Apron. Alfredo Nantes | Blue Apron | 2017
Engineering Extreme Event Forecasting at Uber with Recurrent Neural Networks. Nikolay Laptev and Slawek Smyl and Santhosh Shanmugam | Uber | 2018
3 facts about time series forecasting that surprise experienced machine learning practitioners. Skander Hannachi | Nordstrom | 2018
Very Short, Short, Medium and Long Term Load Forecasting. Tao Hong | University of North Carolina at Charlotte | 2014