On election night in 2008, Twitter received 160 tweets per second — a record at the time. The infrastructure held. It had not always: the service had collapsed several times before, during events generating far less traffic, because the engineers had not yet built the thing that would save it. The fix was not more hardware. It was a gate — a mechanism that counted incoming requests and rejected the ones arriving too fast.
That gate is rate limiting. And the failure mode it exists to prevent — FM3, Unbounded Resource Consumption — is the one that turns a traffic spike into an outage. The trap is subtle: the failure is rarely the spike itself. It is the assumption that the gate will always be there.
What FM3 Actually Looks Like
FM3 is any resource that grows without limit — memory, connections, threads, or, most visibly, a queue. A queue with an arrival rate higher than its drain rate does not slow down gracefully. It grows. It grows until it exhausts memory, and then the process that owns it dies, and everything behind it dies too.
Rate limiting is the standard defence. The mechanism sounds trivial — count requests, reject excess — but the counting algorithm determines whether the system is fair, whether it tolerates legitimate bursts, and whether it stays accurate when the counter is shared across machines. Four algorithms are in standard use, and each is a different contract.
Four Ways to Count
Token bucket — tokens accumulate at a fixed rate; each request spends one; an empty bucket rejects. An idle client banks tokens and can spend them in a burst. This matches how real API clients behave — brief bursts, then idle — which is why it is the most common choice in production APIs.
Leaky bucket — requests enter a queue that drains at a constant rate. Output is perfectly smooth, but legitimate bursty clients are penalised and pay queuing latency.
Fixed window — count requests per fixed window (say, per minute). Simple, but it has a sharp flaw: a client limited to 100/minute can send 100 at 00:59 and 100 at 01:00 — 200 requests in two seconds, 100× over the intended rate. That boundary spike is FM3 leaking straight through the supposed defence. A telltale sign: error rates spiking at round-minute boundaries.
Sliding window — count only requests within the last N seconds. No boundary spike, but it stores per-client timestamps — O(requests) memory per client instead of O(1). For 10,000 clients at 1,000 req/min, that is ten million timestamps in memory. The rate limiter built to prevent FM3 has just created a smaller FM3 inside itself.
Fixed-window boundary spike — FM3 leaking through the gate
limit: 100 requests / minute
window 00:00──────────00:59 │ window 01:00──────────01:59
▲▲▲▲ │ ▲▲▲▲
100 reqs│ 100 reqs
└─────┴─────┘
200 requests in ~2 seconds
= 100× the intended rate
The Tradeoff: Synchronous vs Asynchronous
In a single process the counter lives in local memory. Across many service instances it must be shared — typically in Redis — and now every rate-limit check is a network round-trip. That is AT10 (Synchronous vs Asynchronous).
A synchronous shared counter is accurate: every instance sees the true count. But a 1ms Redis round-trip is fine for a 200ms API call and unacceptable for a 5ms internal service call. The asynchronous alternative — local in-process counters that sync with shared state periodically — adds almost no latency, at the cost of brief over-allowance when a burst arrives before the sync catches up. High-performance systems take that trade deliberately: a little inaccuracy beats per-request coordination cost.
Where It Fails: The Gate That Was the Only Gate
Here is the real FM3 trap. Most production rate limiters fail open during a Redis outage — an unprotected service is better than an inaccessible one. That is the right call, but it has a precondition that teams forget: the service behind the rate limiter must be able to survive without the protection.
If the backend has no timeouts, no circuit breakers, no bulkheads — if rate limiting is the only layer of defence — then a Redis outage removes that single layer and the backend faces unbounded traffic. The failure mode is not the Redis outage. It is the design assumption that one gate was enough. FM3 is a defence-in-depth failure: a service relying solely on its rate limiter has not designed for the day the rate limiter is gone.
There is a second exposure. A distributed limiter that stores a high-volume client's counter on one Redis shard routes all of that client's checks through one shard — FM6 (Hotspotting). A large enterprise customer or a misbehaving bot becomes a hot shard, and the hot shard itself becomes the next thing to exhaust.
Real Systems
Stripe uses a token bucket — a per-second and a per-minute bucket per API key — and returns HTTP 429 with a Retry-After header so clients know exactly when to come back. Cloudflare rate-limits at the edge with a sliding-window approximation: a precise count for the current window plus an interpolated count from the previous one, weighted by elapsed time — accurate to within 0.1% without storing per-client timestamps. Google Cloud API Gateway pairs the two: a token bucket protects against individual client abuse, a leaky bucket smooths aggregate load so the downstream service is never overwhelmed.
The One Sentence
A queue that never stops growing is FM3, and a rate limiter is the gate that stops it — but the gate is only as good as the failure plan behind it, because the day Redis goes down and the limiter fails open, the only thing standing between a traffic spike and an outage is whether the backend can survive without the gate at all.
Concept: A queue whose arrival rate exceeds its drain rate grows without limit — FM3 — and a rate limiter is the gate that stops it.
Core Idea: Four counting algorithms — token bucket, leaky bucket, fixed window, sliding window — each a different contract for fairness, burst tolerance, and memory cost.
Tradeoff: AT10 — Synchronous vs Asynchronous: a shared synchronous counter is accurate but adds a network round-trip; local async counters are fast but briefly over-allow.
Failure Mode: FM3 — Unbounded Resource Consumption: when the limiter fails open during a Redis outage, an undefended backend faces unbounded traffic.
Signal: When a rate limiter is the only layer of defence, the day it fails open is the day the outage arrives.
Series: Book 3, Ch 5