In Practice — Designing a News Feed — The Engineer's Map

In Practice — Designing a News Feed

A product team is designing a personalised news feed. Users follow other users. When someone posts, their followers see it in their feed. The feed must load in under 100ms. There are ten million users, with the top 0.1% having more than 100,000 followers.

Here is every architecture tradeoff this product forces, named explicitly.

AT4 — Precomputation vs On-Demand (the defining choice)

The central question: do you assemble each user’s feed when someone posts (fan-out-on-write), or when the user opens the app (fan-out-on-read)?

Fan-out-on-write: when a user posts, immediately push that post into every follower’s feed. Reads are instant — the feed is precomputed. Writes are expensive — a post from a user with 100,000 followers triggers 100,000 writes.

Fan-out-on-read: store only posts. At read time, query the posts of everyone the user follows and merge them. Writes are cheap. Reads are slow — the merge is expensive at read time, especially when the user follows 500 accounts.

Decision: Fan-out-on-write for users with fewer than 10,000 followers. Fan-out-on-read for accounts above that threshold (the “celebrity problem”). Hybrid: most users get precomputed feeds; celebrity posts are injected at read time. This is the design Twitter described publicly in 2013.

AT1 — Consistency vs Availability (set toward availability)

Is it acceptable for a post to appear in a follower’s feed 2 seconds after it was written? Yes, for a social feed. The feed is not a financial ledger. Eventual consistency is the correct model. A follower who sees a post 3 seconds late is not harmed.

Implication: The feed store does not need synchronous replication. Writes to follower feeds can be processed asynchronously. If the fan-out worker falls behind, the post arrives late — not lost, not inconsistent, just delayed.

AT10 — Synchronous vs Asynchronous (write path is async)

The post creation API returns success as soon as the post is stored. Fan-out to follower feeds happens asynchronously via a message queue. The caller — the user who posted — does not wait for 100,000 feed writes to complete.

Cost: The call stack is broken at the queue boundary. Debugging a fan-out failure requires tracing across an async boundary. Testing requires waiting for async delivery. The latency benefit — the author’s post API returns in under 50ms regardless of follower count — justifies this complexity.

AT2 — Latency vs Throughput (feed read is latency-bound; fan-out is throughput-bound)

Feed reads must return in under 100ms. This drives the precomputation decision above: the feed must be preassembled because assembling it on demand is too slow at P99.

Fan-out is throughput-bound, not latency-bound. Processing 100,000 fan-out writes per post does not need to be fast per write — it needs total throughput to keep up with post volume. A queue with multiple worker instances handles this without a latency constraint per individual write.

AT5 — Centralisation vs Distribution (feed storage is distributed)

Storing precomputed feeds for 10 million users in a single Redis instance is a SPOF and a capacity problem. Feed storage is sharded by user ID across a Redis cluster — each shard owns a range of user IDs and the feeds for those users. Reads and writes for a given user always go to the same shard.

Cost: Hot shards if user ID distribution is uneven. Virtual nodes mitigate this.

AT6 — Generality vs Specialisation (dedicated feed store, not Postgres)

A relational database can model a feed — a table of (user_id, post_id, timestamp) rows. But fetching the top 50 posts for a user from that table, sorted by timestamp, at 10 million reads per day, is a read pattern that a general-purpose SQL database handles less efficiently than a sorted-set store (Redis ZSET: user_id → sorted set of post references). The specialised data structure exists precisely for this query pattern.

Cost: Redis is another operational system. It must be monitored, backed up, and tuned separately from the primary database.

AT3 — Simplicity vs Flexibility (evolve incrementally)

Fan-out-on-read is simpler to build first. There is no fan-out worker, no queue, no precomputed feed store. Query the posts of followed accounts at read time. This is correct for 100,000 users. It becomes too slow at 2 million users when the follow graph is dense.

Decision sequence: Build fan-out-on-read first. Instrument read latency. When P99 exceeds the budget, migrate to the hybrid model. Gall’s Law applies — start with the simple system that works.

Summary:

Tradeoff	Direction	Reason
AT4 Precomputation vs On-Demand	Precomputation (hybrid)	Reads must be sub-100ms; write cost is acceptable for non-celebrity accounts
AT1 Consistency vs Availability	Availability	Feed staleness of seconds is acceptable; double-write is not a risk
AT10 Synchronous vs Asynchronous	Asynchronous fan-out	Post API must not block on 100,000 follower writes
AT2 Latency vs Throughput	Latency (reads), Throughput (fan-out)	Different constraints on different paths
AT5 Centralisation vs Distribution	Distribution	Single feed store is a SPOF and a capacity ceiling
AT6 Generality vs Specialisation	Specialisation (Redis ZSET)	The query pattern justifies a dedicated data structure
AT3 Simplicity vs Flexibility	Simplicity first, complexity earned	Build fan-out-on-read; migrate to hybrid when latency data demands it

Seven of the ten tradeoffs appear in one product decision. None of them were optional. All of them were made implicitly in the original design — the question is only whether they were made explicitly, with the cost side named.

Read in the book →

← How Items Connect — The Decision Protocol Self-Assessment →