Memoization Is Caching

Introduction

This is the most important chapter in Books 1 and 2.

Not because the algorithm is complicated — it is not. The algorithm is simple: before computing something, check whether you have already computed it. If yes, return the stored result. If no, compute it, store the result, and return.

Every piece of infrastructure you will study in Book 3 — caching layers, CDNs, read replicas, materialized views — is built on this idea. And every engineer who understands the idea at the algorithm level will understand infrastructure caching at the design level, without memorising a catalogue of systems.

In Book 1, Chapter 21, you added a dictionary to a recursive function and called it memoization. You saw it reduce an exponential-time computation to linear. Now you will see that the dictionary is a cache. The subproblem is the cache key. The computed result is the cached value. The only difference between a DP memo table and a Redis instance is this: Redis cannot rely on the underlying data remaining constant. The problem that follows from that difference is where all the complexity lives.

Thread Activation

In Book 1, Ch 21, memoization transformed the Fibonacci function from O(2ⁿ) to O(N) by storing results in a dictionary. The pattern: check if memo[key] exists; if yes, return it; if no, compute, store, return.

This chapter is the explicit bridge for Thread 5 (Caching/Memoization): DP memoization (Book 1, Ch 21) → infrastructure caching (this chapter) → CDN and cache layer (Book 3, Ch 6–7). The mathematical structure is identical. The engineering problem added by infrastructure caching is cache invalidation — the memo table assumes the underlying function is pure (same input → always same output). Infrastructure caches must handle the case where the underlying data changes.

The Concept

Memoization in Book 1:

def fib(n: int, memo: dict = {}) -> int:
    if n in memo:
        return memo[n]           # cache hit
    if n <= 1:
        return n
    result = fib(n-1, memo) + fib(n-2, memo)
    memo[n] = result             # cache fill
    return result

Infrastructure caching in production:

import redis

r = redis.Redis()

def get_user_profile(user_id: int) -> dict:
    cache_key = f"user:{user_id}:profile"
    cached = r.get(cache_key)
    if cached:
        return json.loads(cached)      # cache hit
    profile = db.query(f"SELECT * FROM users WHERE id={user_id}")
    r.setex(cache_key, 300, json.dumps(profile))  # cache fill, TTL=300s
    return profile

The structure is identical. The difference: - fib(n) is a pure function: the same input always produces the same output. The memo table never needs invalidation. - get_user_profile(user_id) reads from a mutable database: the user can update their profile. The cached value can become stale.

That one difference — mutability of the underlying data — is the source of all the complexity in production caching.

How It Works

Side-by-Side: Memo Table vs. Infrastructure Cache

Property	DP Memo Table	Infrastructure Cache
Key	Subproblem inputs	Request parameters (user_id, URL, query)
Value	Computed result	Database row, HTTP response, query result
Storage	Dictionary in memory	Redis, Memcached, CDN edge
Invalidation needed?	No (pure functions)	Yes (data can change)
Invalidation mechanism	N/A	TTL, explicit delete, cache-aside, write-through
Hit rate target	100% (subproblems don’t change)	80–99% (read-heavy workloads)
Failure if miss	O(2ⁿ) without memo	Full database query (acceptable but slow)

Cache Invalidation Strategies

Time-to-Live (TTL): every cache entry expires after a fixed duration.

class TTLCache:
    """Cache with per-entry expiration."""
    def __init__(self, ttl_seconds: int):
        self._store: dict[str, tuple] = {}  # key -> (value, expiry_timestamp)
        self._ttl = ttl_seconds

    def get(self, key: str):
        if key not in self._store:
            return None
        value, expiry = self._store[key]
        if time.time() > expiry:
            del self._store[key]
            return None            # cache miss: entry expired
        return value               # cache hit

    def set(self, key: str, value) -> None:
        self._store[key] = (value, time.time() + self._ttl)

TTL is the simplest invalidation strategy. Its weakness: data staleness for the full TTL duration. If a user updates their profile, cached profiles remain stale for up to TTL seconds. For a 300-second TTL, users may see outdated profiles for 5 minutes.

Explicit invalidation: when data changes, explicitly delete the cached entry.

def update_user_profile(user_id: int, new_data: dict) -> None:
    db.execute(f"UPDATE users SET ... WHERE id={user_id}")
    r.delete(f"user:{user_id}:profile")  # invalidate immediately

Explicit invalidation eliminates the staleness window — the cache is cleared at the moment data changes. The cost: every write must know which cache keys to invalidate. For simple one-to-one mappings (user_id → profile), this is easy. For complex queries (all users who joined in March), determining which cache keys to invalidate is hard.

Write-through caching: writes go to both the cache and the database simultaneously.

def write_through_update(user_id: int, new_data: dict) -> None:
    db.execute(f"UPDATE users SET ... WHERE id={user_id}")
    r.set(f"user:{user_id}:profile", json.dumps(new_data))  # update cache inline

Write-through ensures the cache always reflects the latest write. The cost: every write is slower (two writes instead of one). For write-heavy workloads, this doubles write latency.

Worked Example: Cache Invalidation When Data Changes

Consider a product catalogue service. A product’s price is read thousands of times per minute, so the lookup is cached:

def get_product_price(product_id: int) -> float:
    cache_key = f"product:{product_id}:price"
    cached = r.get(cache_key)
    if cached:
        return float(cached)
    price = db.query(f"SELECT price FROM products WHERE id={product_id}")
    r.setex(cache_key, 600, price)   # TTL = 10 minutes
    return price

A pricing team runs a flash sale and updates the price. The function above knows nothing about the update. Three strategies handle this:

Strategy 1 — TTL expiry: do nothing. The cache entry expires after 600 seconds. Every request for up to 10 minutes sees the pre-sale price. Simple to implement; the staleness window is exactly the TTL.

Strategy 2 — Event invalidation: the pricing system emits an event on every price change. A subscriber listens and deletes the cache key immediately:

def on_price_changed(event: dict) -> None:
    product_id = event["product_id"]
    r.delete(f"product:{product_id}:price")   # cache cleared at the moment of change

Staleness window is near-zero. Cost: the event pipeline must be reliable — a dropped event leaves stale data with no expiry (unless a background TTL is kept as a safety net).

Strategy 3 — Write-through: the pricing update service writes to the database and the cache in the same call:

def update_price(product_id: int, new_price: float) -> None:
    db.execute(f"UPDATE products SET price={new_price} WHERE id={product_id}")
    r.setex(f"product:{product_id}:price", 600, new_price)   # cache updated inline

No staleness at all for the specific key written. Cost: the write path becomes slower (two writes), and if the cache write fails after the database write succeeds, the cache holds the old value until TTL expires.

Choosing: TTL alone is acceptable when staleness of up to the TTL duration is tolerable (user profile avatars: 5-minute staleness is fine). Event invalidation is correct when staleness must be near-zero but the event pipeline is reliable (inventory counts, prices). Write-through is correct when you control all write paths and write volume is low (configuration values, feature flags).

The Thundering Herd

Cache entries expire (TTL) or are invalidated (explicit delete). At the moment of expiration, if many processes simultaneously notice a cache miss for the same key, all of them simultaneously query the database. This is the thundering herd (FM7).

import threading
import time

class ThunderingHerdProtectedCache:
    """
    Cache with thundering herd protection via mutex per key.
    Only one thread fetches a given key; others wait for the result.
    """
    def __init__(self):
        self._store: dict = {}
        self._locks: dict[str, threading.Event] = {}
        self._meta_lock = threading.Lock()

    def get_or_compute(self, key: str, compute_fn, ttl: int):
        cached = self._store.get(key)
        if cached and time.time() < cached[1]:
            return cached[0]  # cache hit, not expired

        should_compute = False
        with self._meta_lock:
            # Re-check under lock (another thread may have filled it)
            cached = self._store.get(key)
            if cached and time.time() < cached[1]:
                return cached[0]

            if key in self._locks:
                event = self._locks[key]
            else:
                event = threading.Event()
                self._locks[key] = event
                should_compute = True

        if should_compute:
            value = compute_fn()
            self._store[key] = (value, time.time() + ttl)
            with self._meta_lock:
                del self._locks[key]
            event.set()
            return value
        else:
            event.wait()
            return self._store[key][0]

The mutex-per-key pattern ensures only one thread fetches a given key at a time. Other threads wait rather than issuing duplicate database queries. In practice, many systems use a simpler approximation: probabilistic early expiration (PER) — start refreshing the cache slightly before expiry so that it is never empty.

LRU Cache: Managing Capacity

A memo table grows unbounded — every subproblem result is stored forever. A production cache has bounded memory. When the cache is full, something must be evicted. The standard policy is Least Recently Used (LRU): evict the entry that was accessed least recently.

from collections import OrderedDict

class LRUCache:
    """
    LRU cache with capacity limit.
    Uses OrderedDict to maintain access order in O(1).
    """
    def __init__(self, capacity: int):
        self.capacity = capacity
        self._store: OrderedDict = OrderedDict()

    def get(self, key: str):
        if key not in self._store:
            return None
        self._store.move_to_end(key)   # mark as recently used
        return self._store[key]

    def set(self, key: str, value) -> None:
        if key in self._store:
            self._store.move_to_end(key)
        self._store[key] = value
        if len(self._store) > self.capacity:
            self._store.popitem(last=False)  # evict least recently used

LRU eviction exploits temporal locality: recently accessed items are more likely to be accessed again. For social media profiles (popular users accessed repeatedly), LRU correctly keeps hot items in cache. For batch data processing (each item accessed once), LRU provides no benefit — every item is evicted exactly when the next item fills the cache.

Tradeoffs

AT4 — Precomputation vs. On-Demand

Caching is precomputation: you compute the result once and store it, paying the database query cost on the first miss, then serving all subsequent requests from cache. The benefit grows with the read-to-write ratio. For a user profile read 10,000 times per day but updated once, the cache pays off 9,999 times. For a user profile updated 10 times per day and read 11 times, the cache adds complexity without proportional benefit.

AT1 — Consistency vs. Availability

A cache hit avoids the database — it is fast and available. A cache hit on stale data is incorrect — it sacrifices consistency for availability. Every caching decision is this tradeoff. TTL is the explicit control knob: shorter TTL = more consistent, more database load; longer TTL = faster, more stale.

Redis Invalidation Strategy Decision Framework

Four strategies, each occupying a different position on the consistency-complexity axis:

Strategy	When to Use	Consistency Guarantee	Operational Complexity
TTL-based	Staleness up to TTL duration is tolerable; writes are frequent or write paths are many	Stale reads for up to TTL seconds	Low — set TTL at write time; no additional plumbing
Event-driven invalidation	Near-zero staleness required; a reliable event pipeline exists; write paths are few and known	Near-zero staleness (bounded by event pipeline latency)	Medium — requires event publisher, subscriber, and dead-letter handling
Cache-aside (lazy loading)	Read-heavy workload; cold start is acceptable; write patterns are unpredictable	Stale until next read after TTL expiry or explicit delete	Low-medium — application manages cache fill on miss
Write-through	Write volume is low; all write paths are controlled; stale reads are unacceptable	No staleness for the written key (assumes cache write succeeds)	Medium-high — every write path must update cache; failure between DB and cache write requires recovery

The strategies are not mutually exclusive. Production systems layer them: write-through for high-value keys (inventory, session tokens) + TTL as a safety net + event invalidation for price or availability data. The TTL safety net ensures that even if an event is dropped, stale data eventually expires rather than persisting indefinitely.

Where It Fails

FM7 — Thundering Herd

A cache that serves 99% of reads suddenly expires. In the moment of cache miss, thousands of simultaneous requests hit the database. The database, which handles 100 req/s normally, receives 10,000 requests simultaneously. It becomes overwhelmed. The cache misses take longer than usual because the database is overloaded. The cache fills slowly. The overload persists. Solutions: staggered TTLs (add random jitter to expiration times), background refresh (refresh before expiry), request coalescing (only one thread fetches, others wait).

FM9 — Silent Data Corruption

A cache that is never invalidated returns stale data indefinitely. If the invalidation logic has a bug (a code path that updates the database without invalidating the cache), stale data accumulates silently. Users see outdated information. The system appears healthy — cache hit rates are high, database load is low. The staleness is invisible until a user reports incorrect data. Monitoring cache freshness (sample cache entries and compare to source of truth) is the detection mechanism.

Real Systems

Redis is the dominant infrastructure cache. It stores key-value pairs in memory with optional TTL. It supports complex data structures (sorted sets, lists, hashes) beyond simple string values. Its persistence options (RDB snapshots, AOF logging) allow cache warm-up after restart.

Memcached is a simpler alternative: key-value only, no data structure support, no persistence. It shards data across multiple nodes without the overhead of Redis replication. For pure caching workloads (no persistence needed), Memcached is operationally simpler.

CDN edge caches (Cloudflare, Fastly, Akamai) apply the same structure to HTTP responses. A URL is the cache key; the HTTP response body is the cached value; Cache-Control: max-age=300 is the TTL. Cache-Control headers are the same interface as programmatic TTL, applied at the HTTP protocol level.

Database query result caches: some databases cache the results of frequently repeated queries. MySQL’s query cache (removed in MySQL 8.0) stored SELECT results keyed by the exact query string. PostgreSQL takes a different approach: it caches query plans via prepared statements and keeps frequently accessed data pages in shared_buffers, but does not cache query results directly. The database is acting as its own cache layer for common read patterns.

Concept: Memoization as Infrastructure Caching (the bridge chapter)

Core Idea: A DP memo table and an infrastructure cache have identical structure: key = subproblem/request, value = result. The difference is mutability: DP subproblems are pure functions; infrastructure caches must handle data change. Cache invalidation — deciding when a cached value is stale — is the entire engineering problem. TTL is the universal escape hatch: tolerate staleness up to a configurable window. Explicit invalidation eliminates staleness but requires every writer to know every cache key.

Tradeoff: AT1 — Consistency vs. Availability (shorter TTL = fresher data, more database load; longer TTL = faster, potentially stale; the cache is always choosing between these)

Failure Mode: FM7 — Thundering Herd (cache expiry under high load causes simultaneous database queries from all waiting clients; protect with jitter, background refresh, or request coalescing)

Signal: When the same expensive computation or query is being repeated with the same inputs — add a cache. When cache hit rate drops below 80% for a read-heavy workload — the key structure or TTL policy is wrong. When a cache sits in front of a database that changes infrequently — a long TTL with explicit invalidation is the right model.

Exercises

Level 1 — Understand

State the one structural difference between a DP memo table and an infrastructure cache. What engineering problem does this difference create?
What is the thundering herd problem? At what moment does it occur? Name two techniques that prevent it.
Why does LRU eviction work well for social media profiles but poorly for batch data processing? What access pattern assumption does LRU rely on?
A memoized API response is stale. The user sees yesterday’s data. Name the FM code and the AT code that governs the tradeoff.

Level 2 — Apply

A web application serves user timelines. Each timeline is computed by querying the 50 most recent posts from all the user’s followers. The computation takes 200ms. The timeline is cached with a 5-minute TTL.

User Alice has 10,000 followers. She posts a new update. How long until the cached timeline reflects her post?
Alice’s post goes viral. 50,000 requests for her followers’ timelines arrive within 30 seconds of her post. The cache expires at this moment. Describe the thundering herd event that occurs.
Propose three modifications to the caching strategy that address the thundering herd while maintaining < 10-second staleness for new posts.

Level 3 — Design

A recommendation service generates personalised product recommendations for 50 million users. Each recommendation list is computed by a machine learning model that takes 2 seconds per user. Users browse the platform 3 times per day on average; their interests change slowly (days to weeks).

Design the caching strategy: what to cache, what TTL to use, what invalidation strategy. Justify each choice.
A user makes a purchase. Their recommendations should update within 60 seconds to reflect the purchase. How does this change your invalidation strategy?
The ML model is retrained weekly with new data. On retraining, all 50 million cached recommendations become stale simultaneously. How would you handle the cache warm-up after retraining without causing a thundering herd? Name every tradeoff using AT notation.

A complete answer will: derive the result from first principles, cite the relevant codes, and show what the answer implies for design or operation.

Read in the book →