A recommendation system has 5M items. Candidate generation returns 500 items. The scoring model takes 0.5ms to score a single item. (a) If items are scored sequentially, what is the scoring latency? (b) If all 500 items are batched into a single model call that takes 50ms, what is the total recommendation latency? (c) What AT code describes the tradeoff between batch size and latency?
A feature store records user’s last 10 interactions with TTL of 24 hours. A user interacts with a product, then immediately refreshes recommendations. (a) How fresh is the interaction feature? (b) If the feature is in the offline store with a 6-hour update pipeline, what FM code describes the staleness? (c) What store type and pipeline would you use to serve sub-second freshness?
A complete answer will: (1) design a three-stage pipeline — candidate generation (ANN retrieval from user embedding against 60M track embeddings, retrieving top-1000 candidates per playlist), scoring (a ranking model applied to candidates using user history features and track metadata), and re-ranking (diversity injection to ensure new discoveries meet a stated fraction of each playlist, e.g., 20% new tracks) — with a concrete 4-hour budget estimate across 50M users showing the pipeline must parallelise across a compute cluster, (2) name FM4 (stale data / training-serving skew) and identify its specific form here: the training data distribution differs from serving distribution because the model trained on historical plays, which are biased toward tracks that were already popular — the mitigation is logging exploration plays separately and retraining on a debiased dataset using inverse propensity scoring, (3) describe the feedback loop and its degradation signal: if the model is retrained on its own recommendations, it amplifies whatever genres it initially surfaced, narrowing diversity over time — detectable by tracking the entropy of genre distribution in generated playlists across weekly model retraining cycles, and (4) propose a concrete discovery mechanism with its AT9 tradeoff: injecting tracks from underexplored genres at re-ranking improves discovery but degrades short-term engagement (measured by skip rate) — the answer must state an explicit exploration rate and how it is tuned against engagement metrics.