The Cache Layer: When It Helps and When It Hurts

Your database handles 5,000 reads per second. Tonight 50,000 arrive. The cache expires at the wrong moment.

Every engineer has been in this room. The cache solved the scaling problem for six months. Then the cache became the scaling problem. Understanding why requires understanding what a cache guarantees and what it does not.

What It Does

A cache stores recently or frequently accessed data in memory. Memory reads complete in microseconds. Database reads take milliseconds. That three-order-of-magnitude difference is the entire value proposition.

Request path (with cache):

Client → Load Balancer → App Server → Cache ──(hit)──→ return µs response
                                           │
                                         (miss)
                                           │
                                        Database → populate cache → return

A cache sits between your application and your data store. On a hit, it returns data in microseconds. On a miss, it reads from the database, stores the result, and returns it. Subsequent requests for the same data hit the cache.

The Guarantee

Microsecond reads for hot data. A Redis instance serves 100,000 reads per second from a single node. The guarantee is speed, not correctness.

The No-Guarantee

The cache does not guarantee consistency with the source of truth. The cache may serve stale data. Always. This is not a bug. It is the fundamental tradeoff.

Cached data was fetched earlier and stored for reuse. The staleness window is the time between the source changing and the cache reflecting that change. You control this window with TTL. You never eliminate it.

Two Failure Modes

The Thundering Herd on Expiry

A popular cache key expires. One thousand requests arrive in the same millisecond. All one thousand see a cache miss. All one thousand query the database simultaneously. The database, sized for 50 queries per second on this key, receives 1,000 in one second.

The cache was protecting the database. The moment the cache key expired, the protection disappeared. The traffic that the cache was absorbing lands on the database in a single burst.

Prevention: request coalescing. The first miss triggers a database query. Subsequent misses for the same key wait for the first query to complete. Add jitter to TTL values so keys do not expire simultaneously.

Stale Data Served as Fresh

A user updates their email address. The write goes to the database. The cache still holds the old email. For the next 300 seconds, every read returns the wrong value.

Prevention: cache invalidation on write. When the application writes to the database, it also deletes or updates the corresponding cache key. This reduces the staleness window to near-zero but introduces complexity: the write must now succeed in two places. If the cache invalidation fails, stale data persists.

The Config Decision: TTL

TTL is the single most important cache configuration. It determines how long a cached value survives before automatic expiry.

TTL too short: The cache expires frequently. Hit rates drop. The database absorbs more traffic. You paid for a cache that barely helps.

TTL too long: The cache holds stale data for extended periods. Users see outdated information. Business logic operates on wrong values.

There is no correct TTL. There is only the TTL that matches your staleness tolerance. A social media profile photo can tolerate 5 minutes of staleness. An account balance cannot tolerate 5 seconds. Set TTL per key pattern, not globally.

Real Systems

Redis. In-memory data store with data structures beyond simple key-value: strings, hashes, lists, sets, sorted sets. Offers persistence options. Single-threaded command processing eliminates lock contention. The most common application cache in production.

Memcached. Pure key-value cache. No persistence. No data structures beyond strings. Multi-threaded. Slightly higher throughput for simple get/set workloads. Choose Memcached when you need only caching with no data structure operations.

CDN edge caches. Cache HTTP responses at geographic edge locations. The same TTL and staleness tradeoffs apply. Cache misses fall through to your origin server. Edge caches are the first cache layer in the request path.

Concept: Cache Layer

Tradeoff: AT4 — short TTL means low staleness but low hit rate; long TTL means high hit rate but stale data

Failure Mode: FM7 — when a popular key expires, all requests simultaneously hit the unprotected database

Signal: Your database read load exceeds capacity and most reads serve the same data repeatedly

Series: Book 3, Ch 6