Real-World Variants

Netflix uses a multi-stage pipeline with separate models for different recommendation surfaces (home page, because-you-watched, similar items). The home page ranking uses tens of features; the similar items surface uses primarily item-embedding similarity.

YouTube published the 2016 paper “Deep Neural Networks for YouTube Recommendations” — the canonical reference for multi-stage recommendation. Two neural networks: one for candidate generation (user embedding to item embedding), one for ranking (many features, including impression history).

Spotify emphasises audio feature similarity alongside collaborative filtering. Track embeddings are built from audio analysis (tempo, key, energy) combined with listening behaviour embeddings.

Amazon uses item-to-item collaborative filtering at massive scale. “Customers who bought X also bought Y” is a precomputed similarity table, updated continuously as purchases occur.

Concept: Recommendation Engine

Thread: T11 (Feedback) ← Ch 6 (API Gateway) → Ch 18 (Ride-Sharing); T5 (Caching/Memo) ← Ch 8 (Autocomplete) → Ch 21 (ML Feature Store)

Core Idea: A multi-stage pipeline — candidate generation, scoring, re-ranking — makes recommendation tractable at scale by narrowing 10M items to 1,000 before applying expensive per-item scoring. The feature store separates training-time correctness from serving-time latency.

Tradeoff: AT4 — Precomputation vs On-Demand: item embeddings and ANN indexes are precomputed; per-user feature retrieval and model scoring happen on-demand at query time.

Failure Mode: FM4 — Data Consistency Failure: training-serving skew, where features are computed differently during training and serving, causes silent degradation of recommendation quality in production.

Signal: When a catalogue of more than 10,000 items must be personalised for millions of users in under 100ms, with engagement as the primary quality metric.

Maps to: Reference Book, Framework 6 (System Archetypes)

Read in the book →

← How It Evolves at Scale Exercises →