Key Design Decisions

AT4 — Precomputation/On-Demand: Candidate generation is precomputed (item embeddings are pre-built, ANN index is pre-built). Scoring is on-demand (features retrieved and model scored at query time). Re-ranking is on-demand. The boundary between precomputed and on-demand sits between candidate generation and scoring.

AT2 — Latency/Throughput: The online feature store (Redis) provides sub-millisecond feature retrieval for individual users. The offline feature store provides high-throughput training data. Training and serving use different stores because latency and throughput requirements are incompatible in a single system.

AT9 — Correctness/Performance: Approximate nearest neighbour search (HNSW, FAISS) finds candidates in O(log N) with >95% recall, not O(N) with 100% recall. The recall drop is acceptable; the latency reduction (from seconds to milliseconds) is not optional.

Read in the book →

← Architecture Walkthrough Failure Modes in This System →