This is the most important chapter in Books 1 and 2.
Not because the algorithm is complicated — it is not. The algorithm is simple: before computing something, check whether you have already computed it. If yes, return the stored result. If no, compute it, store the result, and return.
Every piece of infrastructure you will study in Book 3 — caching layers, CDNs, read replicas, materialized views — is built on this idea. And every engineer who understands the idea at the algorithm level will understand infrastructure caching at the design level, without memorising a catalogue of systems.
In Book 1, Chapter 21, you added a dictionary to a recursive function and called it memoization. You saw it reduce an exponential-time computation to linear. Now you will see that the dictionary is a cache. The subproblem is the cache key. The computed result is the cached value. The only difference between a DP memo table and a Redis instance is this: Redis cannot rely on the underlying data remaining constant. The problem that follows from that difference is where all the complexity lives.
In Book 1, Ch 21, memoization transformed the Fibonacci function from
O(2ⁿ) to O(N) by storing results in a dictionary. The pattern: check if
memo[key] exists; if yes, return it; if no, compute, store,
return.
This chapter is the explicit bridge for Thread 5 (Caching/Memoization): DP memoization (Book 1, Ch 21) → infrastructure caching (this chapter) → CDN and cache layer (Book 3, Ch 6–7). The mathematical structure is identical. The engineering problem added by infrastructure caching is cache invalidation — the memo table assumes the underlying function is pure (same input → always same output). Infrastructure caches must handle the case where the underlying data changes.
Memoization in Book 1:
def fib(n: int, memo: dict = {}) -> int:
if n in memo:
return memo[n] # cache hit
if n <= 1:
return n
result = fib(n-1, memo) + fib(n-2, memo)
memo[n] = result # cache fill
return resultInfrastructure caching in production:
import redis
r = redis.Redis()
def get_user_profile(user_id: int) -> dict:
cache_key = f"user:{user_id}:profile"
cached = r.get(cache_key)
if cached:
return json.loads(cached) # cache hit
profile = db.query(f"SELECT * FROM users WHERE id={user_id}")
r.setex(cache_key, 300, json.dumps(profile)) # cache fill, TTL=300s
return profileThe structure is identical. The difference: - fib(n) is
a pure function: the same input always produces the
same output. The memo table never needs invalidation. -
get_user_profile(user_id) reads from a mutable
database: the user can update their profile. The cached value
can become stale.
That one difference — mutability of the underlying data — is the source of all the complexity in production caching.
| Property | DP Memo Table | Infrastructure Cache |
|---|---|---|
| Key | Subproblem inputs | Request parameters (user_id, URL, query) |
| Value | Computed result | Database row, HTTP response, query result |
| Storage | Dictionary in memory | Redis, Memcached, CDN edge |
| Invalidation needed? | No (pure functions) | Yes (data can change) |
| Invalidation mechanism | N/A | TTL, explicit delete, cache-aside, write-through |
| Hit rate target | 100% (subproblems don’t change) | 80–99% (read-heavy workloads) |
| Failure if miss | O(2ⁿ) without memo | Full database query (acceptable but slow) |
Time-to-Live (TTL): every cache entry expires after a fixed duration.
class TTLCache:
"""Cache with per-entry expiration."""
def __init__(self, ttl_seconds: int):
self._store: dict[str, tuple] = {} # key -> (value, expiry_timestamp)
self._ttl = ttl_seconds
def get(self, key: str):
if key not in self._store:
return None
value, expiry = self._store[key]
if time.time() > expiry:
del self._store[key]
return None # cache miss: entry expired
return value # cache hit
def set(self, key: str, value) -> None:
self._store[key] = (value, time.time() + self._ttl)TTL is the simplest invalidation strategy. Its weakness: data staleness for the full TTL duration. If a user updates their profile, cached profiles remain stale for up to TTL seconds. For a 300-second TTL, users may see outdated profiles for 5 minutes.
Explicit invalidation: when data changes, explicitly delete the cached entry.
def update_user_profile(user_id: int, new_data: dict) -> None:
db.execute(f"UPDATE users SET ... WHERE id={user_id}")
r.delete(f"user:{user_id}:profile") # invalidate immediatelyExplicit invalidation eliminates the staleness window — the cache is cleared at the moment data changes. The cost: every write must know which cache keys to invalidate. For simple one-to-one mappings (user_id → profile), this is easy. For complex queries (all users who joined in March), determining which cache keys to invalidate is hard.
Write-through caching: writes go to both the cache and the database simultaneously.
def write_through_update(user_id: int, new_data: dict) -> None:
db.execute(f"UPDATE users SET ... WHERE id={user_id}")
r.set(f"user:{user_id}:profile", json.dumps(new_data)) # update cache inlineWrite-through ensures the cache always reflects the latest write. The cost: every write is slower (two writes instead of one). For write-heavy workloads, this doubles write latency.
Consider a product catalogue service. A product’s price is read thousands of times per minute, so the lookup is cached:
def get_product_price(product_id: int) -> float:
cache_key = f"product:{product_id}:price"
cached = r.get(cache_key)
if cached:
return float(cached)
price = db.query(f"SELECT price FROM products WHERE id={product_id}")
r.setex(cache_key, 600, price) # TTL = 10 minutes
return priceA pricing team runs a flash sale and updates the price. The function above knows nothing about the update. Three strategies handle this:
Strategy 1 — TTL expiry: do nothing. The cache entry expires after 600 seconds. Every request for up to 10 minutes sees the pre-sale price. Simple to implement; the staleness window is exactly the TTL.
Strategy 2 — Event invalidation: the pricing system emits an event on every price change. A subscriber listens and deletes the cache key immediately:
def on_price_changed(event: dict) -> None:
product_id = event["product_id"]
r.delete(f"product:{product_id}:price") # cache cleared at the moment of changeStaleness window is near-zero. Cost: the event pipeline must be reliable — a dropped event leaves stale data with no expiry (unless a background TTL is kept as a safety net).
Strategy 3 — Write-through: the pricing update service writes to the database and the cache in the same call:
def update_price(product_id: int, new_price: float) -> None:
db.execute(f"UPDATE products SET price={new_price} WHERE id={product_id}")
r.setex(f"product:{product_id}:price", 600, new_price) # cache updated inlineNo staleness at all for the specific key written. Cost: the write path becomes slower (two writes), and if the cache write fails after the database write succeeds, the cache holds the old value until TTL expires.
Choosing: TTL alone is acceptable when staleness of up to the TTL duration is tolerable (user profile avatars: 5-minute staleness is fine). Event invalidation is correct when staleness must be near-zero but the event pipeline is reliable (inventory counts, prices). Write-through is correct when you control all write paths and write volume is low (configuration values, feature flags).
Cache entries expire (TTL) or are invalidated (explicit delete). At the moment of expiration, if many processes simultaneously notice a cache miss for the same key, all of them simultaneously query the database. This is the thundering herd (FM7).
import threading
import time
class ThunderingHerdProtectedCache:
"""
Cache with thundering herd protection via mutex per key.
Only one thread fetches a given key; others wait for the result.
"""
def __init__(self):
self._store: dict = {}
self._locks: dict[str, threading.Event] = {}
self._meta_lock = threading.Lock()
def get_or_compute(self, key: str, compute_fn, ttl: int):
cached = self._store.get(key)
if cached and time.time() < cached[1]:
return cached[0] # cache hit, not expired
should_compute = False
with self._meta_lock:
# Re-check under lock (another thread may have filled it)
cached = self._store.get(key)
if cached and time.time() < cached[1]:
return cached[0]
if key in self._locks:
event = self._locks[key]
else:
event = threading.Event()
self._locks[key] = event
should_compute = True
if should_compute:
value = compute_fn()
self._store[key] = (value, time.time() + ttl)
with self._meta_lock:
del self._locks[key]
event.set()
return value
else:
event.wait()
return self._store[key][0]The mutex-per-key pattern ensures only one thread fetches a given key at a time. Other threads wait rather than issuing duplicate database queries. In practice, many systems use a simpler approximation: probabilistic early expiration (PER) — start refreshing the cache slightly before expiry so that it is never empty.
A memo table grows unbounded — every subproblem result is stored forever. A production cache has bounded memory. When the cache is full, something must be evicted. The standard policy is Least Recently Used (LRU): evict the entry that was accessed least recently.
from collections import OrderedDict
class LRUCache:
"""
LRU cache with capacity limit.
Uses OrderedDict to maintain access order in O(1).
"""
def __init__(self, capacity: int):
self.capacity = capacity
self._store: OrderedDict = OrderedDict()
def get(self, key: str):
if key not in self._store:
return None
self._store.move_to_end(key) # mark as recently used
return self._store[key]
def set(self, key: str, value) -> None:
if key in self._store:
self._store.move_to_end(key)
self._store[key] = value
if len(self._store) > self.capacity:
self._store.popitem(last=False) # evict least recently usedLRU eviction exploits temporal locality: recently accessed items are more likely to be accessed again. For social media profiles (popular users accessed repeatedly), LRU correctly keeps hot items in cache. For batch data processing (each item accessed once), LRU provides no benefit — every item is evicted exactly when the next item fills the cache.
AT4 — Precomputation vs. On-Demand
Caching is precomputation: you compute the result once and store it, paying the database query cost on the first miss, then serving all subsequent requests from cache. The benefit grows with the read-to-write ratio. For a user profile read 10,000 times per day but updated once, the cache pays off 9,999 times. For a user profile updated 10 times per day and read 11 times, the cache adds complexity without proportional benefit.
AT1 — Consistency vs. Availability
A cache hit avoids the database — it is fast and available. A cache hit on stale data is incorrect — it sacrifices consistency for availability. Every caching decision is this tradeoff. TTL is the explicit control knob: shorter TTL = more consistent, more database load; longer TTL = faster, more stale.
Four strategies, each occupying a different position on the consistency-complexity axis:
| Strategy | When to Use | Consistency Guarantee | Operational Complexity |
|---|---|---|---|
| TTL-based | Staleness up to TTL duration is tolerable; writes are frequent or write paths are many | Stale reads for up to TTL seconds | Low — set TTL at write time; no additional plumbing |
| Event-driven invalidation | Near-zero staleness required; a reliable event pipeline exists; write paths are few and known | Near-zero staleness (bounded by event pipeline latency) | Medium — requires event publisher, subscriber, and dead-letter handling |
| Cache-aside (lazy loading) | Read-heavy workload; cold start is acceptable; write patterns are unpredictable | Stale until next read after TTL expiry or explicit delete | Low-medium — application manages cache fill on miss |
| Write-through | Write volume is low; all write paths are controlled; stale reads are unacceptable | No staleness for the written key (assumes cache write succeeds) | Medium-high — every write path must update cache; failure between DB and cache write requires recovery |
The strategies are not mutually exclusive. Production systems layer them: write-through for high-value keys (inventory, session tokens) + TTL as a safety net + event invalidation for price or availability data. The TTL safety net ensures that even if an event is dropped, stale data eventually expires rather than persisting indefinitely.
FM7 — Thundering Herd
A cache that serves 99% of reads suddenly expires. In the moment of cache miss, thousands of simultaneous requests hit the database. The database, which handles 100 req/s normally, receives 10,000 requests simultaneously. It becomes overwhelmed. The cache misses take longer than usual because the database is overloaded. The cache fills slowly. The overload persists. Solutions: staggered TTLs (add random jitter to expiration times), background refresh (refresh before expiry), request coalescing (only one thread fetches, others wait).
FM9 — Silent Data Corruption
A cache that is never invalidated returns stale data indefinitely. If the invalidation logic has a bug (a code path that updates the database without invalidating the cache), stale data accumulates silently. Users see outdated information. The system appears healthy — cache hit rates are high, database load is low. The staleness is invisible until a user reports incorrect data. Monitoring cache freshness (sample cache entries and compare to source of truth) is the detection mechanism.
Redis is the dominant infrastructure cache. It stores key-value pairs in memory with optional TTL. It supports complex data structures (sorted sets, lists, hashes) beyond simple string values. Its persistence options (RDB snapshots, AOF logging) allow cache warm-up after restart.
Memcached is a simpler alternative: key-value only, no data structure support, no persistence. It shards data across multiple nodes without the overhead of Redis replication. For pure caching workloads (no persistence needed), Memcached is operationally simpler.
CDN edge caches (Cloudflare, Fastly, Akamai) apply
the same structure to HTTP responses. A URL is the cache key; the HTTP
response body is the cached value;
Cache-Control: max-age=300 is the TTL. Cache-Control
headers are the same interface as programmatic TTL, applied at the HTTP
protocol level.
Database query result caches: some databases cache the results of frequently repeated queries. MySQL’s query cache (removed in MySQL 8.0) stored SELECT results keyed by the exact query string. PostgreSQL takes a different approach: it caches query plans via prepared statements and keeps frequently accessed data pages in shared_buffers, but does not cache query results directly. The database is acting as its own cache layer for common read patterns.
Concept: Memoization as Infrastructure Caching (the bridge chapter)
Core Idea: A DP memo table and an infrastructure cache have identical structure: key = subproblem/request, value = result. The difference is mutability: DP subproblems are pure functions; infrastructure caches must handle data change. Cache invalidation — deciding when a cached value is stale — is the entire engineering problem. TTL is the universal escape hatch: tolerate staleness up to a configurable window. Explicit invalidation eliminates staleness but requires every writer to know every cache key.
Tradeoff: AT1 — Consistency vs. Availability (shorter TTL = fresher data, more database load; longer TTL = faster, potentially stale; the cache is always choosing between these)
Failure Mode: FM7 — Thundering Herd (cache expiry under high load causes simultaneous database queries from all waiting clients; protect with jitter, background refresh, or request coalescing)
Signal: When the same expensive computation or query is being repeated with the same inputs — add a cache. When cache hit rate drops below 80% for a read-heavy workload — the key structure or TTL policy is wrong. When a cache sits in front of a database that changes infrequently — a long TTL with explicit invalidation is the right model.
State the one structural difference between a DP memo table and an infrastructure cache. What engineering problem does this difference create?
What is the thundering herd problem? At what moment does it occur? Name two techniques that prevent it.
Why does LRU eviction work well for social media profiles but poorly for batch data processing? What access pattern assumption does LRU rely on?
A memoized API response is stale. The user sees yesterday’s data. Name the FM code and the AT code that governs the tradeoff.
A web application serves user timelines. Each timeline is computed by querying the 50 most recent posts from all the user’s followers. The computation takes 200ms. The timeline is cached with a 5-minute TTL.
User Alice has 10,000 followers. She posts a new update. How long until the cached timeline reflects her post?
Alice’s post goes viral. 50,000 requests for her followers’ timelines arrive within 30 seconds of her post. The cache expires at this moment. Describe the thundering herd event that occurs.
Propose three modifications to the caching strategy that address the thundering herd while maintaining < 10-second staleness for new posts.
A recommendation service generates personalised product recommendations for 50 million users. Each recommendation list is computed by a machine learning model that takes 2 seconds per user. Users browse the platform 3 times per day on average; their interests change slowly (days to weeks).
Design the caching strategy: what to cache, what TTL to use, what invalidation strategy. Justify each choice.
A user makes a purchase. Their recommendations should update within 60 seconds to reflect the purchase. How does this change your invalidation strategy?
The ML model is retrained weekly with new data. On retraining, all 50 million cached recommendations become stale simultaneously. How would you handle the cache warm-up after retraining without causing a thundering herd? Name every tradeoff using AT notation.
A complete answer will: derive the result from first principles, cite the relevant codes, and show what the answer implies for design or operation.