Cheat sheet

Forecast the Material, Not the Product — Cheat Sheet

An IE MBD Capstone for a Spanish road-safety manufacturer. Aggregating SKUs into raw materials, splitting by purchase volume, and routing series to Prophet or XGBoost based on shape.

Read the full projectUpdated June 2026
1

The problem

The client (a Spanish road-safety equipment manufacturer) faced two structural facts:

  1. Steel lead time is months. Procurement decisions had to be made far ahead of order intake.
  2. Hundreds of finished products share a few dozen raw materials. Most series at the SKU level were too noisy to forecast reliably.

The intuition: don't forecast every SKU. Forecast at the level where decisions actually get made — raw material consumption.

2

The aggregation move

Each finished product is a known combination of raw materials. So:

forecast(raw_material) = Σ forecast(SKU) × BOM(SKU)

— but inverting the order:

actual_consumption(raw_material) = Σ actual_sales(SKU) × BOM(SKU)

Aggregate the history first, then forecast that. Result: hundreds of noisy SKU series → dozens of cleaner material series.

This is the move that made everything tractable.

3

The routing

Not every material series has the same shape. Two regimes:

Series typeExamplesModel
Heavy + regularTop 20 % of materials by volume, recurring patternProphet
Long tailSporadic, irregular, intermittent demandXGBoost

Why Prophet for the heavy ones: built-in seasonality, holiday effects, robust to gaps. Fast, interpretable.

Why XGBoost for the tail: with lag features + calendar features, it handles the irregularity better than Prophet's smooth additive model.

4

Feature engineering

For the XGBoost track:

  • Lag featuresconsumption(t−1), consumption(t−7), consumption(t−28).
  • Rolling stats — mean, std, max over the past 4 / 8 / 12 weeks.
  • Calendar features — month, week-of-year, is-holiday, days-to-next-quarter.
  • Aggregations across related materials — leakage-safe via group keys.

For Prophet: just the time series + holiday calendar. Prophet's value is built-in seasonality fitting.

5

Validation

Time-series cross-validation:

  • Rolling-origin evaluation — train on [0, t], score on [t, t+h], slide forward.
  • Never shuffle. Never split randomly.
  • Compare each model to a seasonal-naïve baseline — many "fancy" forecasts don't beat last-year-same-month.

Metric: MAPE for relative comparison, RMSE for absolute error in tonnes of steel.

6

What I learned

  • Aggregating before forecasting is often more valuable than tuning the model.
  • Hybrid routing (one tool per series shape) beats forcing one model on everything.
  • Seasonal-naïve is a brutal baseline. Half the time it wins. Always include it.
  • The point of forecasting in supply chain isn't the lowest MAPE. It's a forecast that planners trust enough to commit budget against. Interpretability matters.