Architecture Walkthrough

Multi-Stage Pipeline

Candidate Generation

Two approaches dominate:

Collaborative filtering: items similar to what the user has interacted with (item-to-item similarity), computed using embeddings. Pre-compute item embedding vectors offline; at query time, find nearest neighbours using approximate nearest neighbour search (T2 — a tree-indexed vector space).

Content-based filtering: items similar in attributes to what the user prefers. A user who watches action films gets action films. This does not require interaction data — it works for new users (cold start).

Matrix factorisation: decompose the user-item interaction matrix into user embeddings and item embeddings. Each user and item is a dense vector; the dot product of a user vector and an item vector predicts the interaction probability. Pre-compute all user and item embeddings offline.

Feature Store

Candidate scoring requires features: user demographic features, item attributes, interaction history, contextual signals (time of day, device type). Features come from two stores:

Offline feature store: a data warehouse with historical features. Used for model training and for features that do not need to be fresh (user age, item category).

Online feature store: a low-latency key-value store (Redis) with pre-computed features that must be fresh for serving. User’s recent interactions, item’s current popularity, time-since-last-viewed.

Feature store architecture:
  Offline:  Data warehouse (Snowflake, BigQuery) ← batch pipelines
  Online:   Redis cluster ← streaming pipelines (Kafka → Flink → Redis)
  Serving:  Feature retrieval joins online + offline at query time

Scoring Model

The scoring model is typically a neural network or gradient boosted tree that takes features as input and outputs a predicted engagement probability. It runs on every candidate in the shortlist (~1,000 items) at query time. Batching all 1,000 candidates into a single model call amortises the model loading cost.

Re-ranking

Business rules applied after scoring: diversity (avoid 10 consecutive items from the same creator), freshness boost (prefer newer items over equally scored older items), policy filters (age-gated content, sponsored items), serendipity injection (surface a fraction of items outside predicted preferences).

Feedback Loop

User interactions — clicks, views, likes, skips — are fed back into the training pipeline. More interactions → better model → better recommendations → more interactions. This is T11 (Feedback) operating at the system level.

Recommendation → User interaction → Interaction log → Training data
→ Updated model → Better recommendation (loop)

Read in the book →

← Naive Approach and Why It Fails Key Design Decisions →