Thundering Herd: Why Caches Make Outages Worse

In 2013, an Instagram engineering team pushed a deployment during peak traffic hours. The deployment restarted a set of application servers. Those servers had warm, in-memory caches. After the restart, every cache was cold.

Thousands of requests that had been served from cache now hit the database simultaneously. The database connection pool saturated in seconds. Queries backed up. Timeouts cascaded to upstream services. The entire read path degraded for millions of users.

The cache was supposed to protect the database. Instead, its absence nearly destroyed it.

What Happened

A cache entry for a popular resource — a user profile, a trending post, a feed fragment — serves thousands of reads per second. The database handles zero of those reads. The cache absorbs them all.

When that entry disappears, every concurrent request discovers the miss at the same moment. Each request independently decides to fetch from the database. Each opens a connection, issues a query, and waits for the result.

Normal (cache warm):           Cache miss (all at once):

Client → Cache → return        Client ───────┐
Client → Cache → return        Client ───────┤
Client → Cache → return        Client ───────┼──→ Database  ← saturated
Client → Cache → return        Client ───────┤
Client → Cache → return        Client ───────┘

1 DB query total               5,000 DB queries, same row, same second

If 5,000 requests per second hit a single cache key, and the cache entry vanishes, the database receives 5,000 queries for the same row in the same second. The database was handling zero queries for that row. Now it handles 5,000. The connection pool fills. Legitimate queries queue behind the herd. Latency spikes across the board.

Why It Was Not Obvious

Engineers design caches to improve the common case. The common case is a cache hit. Cache hits are fast, cheap, and invisible to the database. The system appears healthy because the cache absorbs all the load.

The failure case — a cache miss under high concurrency — is invisible during normal operation. Load tests rarely simulate simultaneous expiry of hot keys. Monitoring dashboards show hit ratios, not miss amplification.

The mental model "cache reduces database load" is correct on average and catastrophically wrong at the moment it matters most. The moment a popular entry expires is precisely when database protection disappears and demand is highest.

The Absent Principle

The herd forms because each request acts independently. Request A discovers the miss. Request B discovers the miss. Both fetch from the database. Both write back to the cache. Request B's work is entirely redundant.

The fix is idempotent cache population: only one request fetches from the database. All others wait for that single fetch to complete. The result is written to the cache once. Subsequent requests read the cached value.

Without this property, every miss is amplified by the concurrency factor. A key serving 10,000 requests per second produces a 10,000x amplification on miss.

Three Prevention Patterns

1. Jitter on TTL

The simplest prevention. Instead of all cache entries expiring at a fixed TTL, add a random offset. A 300-second TTL becomes 270–330 seconds per entry.

Hot keys no longer expire at the same instant. The herd fragments into smaller groups spread across the jitter window. Each group is small enough for the database to handle.

Jitter prevents simultaneous expiry across many keys. For single-key protection, you need the next pattern.

2. Request Coalescing (Mutex)

When a cache miss occurs, the first request acquires a lock on that cache key. It fetches from the database and populates the cache. All other requests for the same key wait for the lock to release, then read the freshly populated cache.

Implementation: a distributed lock keyed by the cache key, with TTL longer than the expected database fetch time. Only one request hits the database per cache miss. The amplification factor drops from N (concurrent requests) to 1.

The cost: added latency for the waiting requests. This is the correct tradeoff. One slow fetch plus N fast cache reads is better than N simultaneous slow fetches that crash the database.

3. Background Refresh

The cache entry refreshes before it expires. A background process re-fetches from the database while the current value is still valid. No request ever sees a miss for a hot key. The database receives a single query per refresh cycle, not a burst on expiry.

The cost: the cache serves slightly stale data during the refresh window. For most use cases — social feeds, product listings, leaderboards — this is acceptable. For financial balances, it is not.

Background refresh requires knowing which keys are hot. Refreshing every key wastes resources. Refreshing only keys with high hit rates targets the entries most likely to cause a herd.

The Most Exposed Systems

FM7 hits hardest on social platforms. These systems have two properties that maximize thundering herd risk.

High fan-out. A single post reaches millions of followers. The cache entry for a trending post serves extreme read volume. One miss produces a massive herd.

Hot keys. Celebrity profiles, viral posts, and trending topics concentrate traffic on a small number of cache keys. The Pareto distribution applies: 1% of keys serve 50% of reads. Those 1% produce the worst herds.

The Compounding Danger

FM7 rarely acts alone. The herd exhausts database connections. Database timeouts cascade to application servers. If monitoring does not alert on cache miss rate, the team discovers the problem from user complaints — by which point recovery takes hours, not minutes.

The signal: cache hit ratio drops suddenly while request volume stays constant. If you see this pattern, the herd has already formed.

Concept: FM7 (Thundering Herd)

Tradeoff: AT4 — the cache is precomputed data; when it vanishes, every request falls back to on-demand computation simultaneously

Failure Mode: FM7 compounds with resource exhaustion, cascading failure, and observability blindness to convert a cache miss into a system-wide outage

Signal: Cache hit ratio drops suddenly while traffic stays constant; connection pool hits 100% after a deployment

Series: Book 3, Ch 6