The Computing Series

Real Systems

DynamoDB uses consistent hashing to distribute data across nodes. Each item’s partition key is hashed using a consistent hash function; the hash determines which partition the item lives on. When DynamoDB adds capacity, only the partitions adjacent to the new nodes need to migrate data.

Cassandra uses a consistent hash ring with virtual nodes (called “vnodes” in Cassandra documentation). Each node is assigned multiple token ranges on the ring. The token range assignment is configurable; nodes with more capacity receive more tokens.

CDNs (Akamai, Fastly, CloudFlare) use consistent hashing to route requests to edge nodes. A request for a given URL is always routed to the same edge node (or a small set of edge nodes), maximising cache hit rates. When edge nodes are added (during a traffic spike), consistent hashing ensures most cached content remains at the same nodes.


Concept: Consistent Hashing

Thread: T1 (Hashing) ← hash table (Book 1, Ch 22), modular arithmetic (Book 1, Ch 39) → sharding (Book 3, Ch 4–5)

Core Idea: Map both keys and nodes onto a circular ring using the same hash function. A key is served by the first node clockwise from its ring position. When a node joins or leaves, only the keys between the new/removed node and its predecessor migrate — on average K/N keys out of K total. Virtual nodes (multiple ring positions per physical node) correct uneven load distribution.

Tradeoff: AT3 — Simplicity vs. Flexibility (modular hashing is O(1) and trivial but breaks all sessions on cluster change; consistent hashing is O(log N) and complex but minimises migration)

Failure Mode: FM6 — Hotspotting (without virtual nodes, arc-length variance creates uneven load across nodes)

Signal: When node membership changes frequently (failures, autoscaling, rolling deployments) and session affinity or cache locality must survive those changes.

Maps to: Reference Book, Framework 8, Infrastructure Components (Cache Layer, Load Balancer)


Read in the book →