Consistent Hashing: The Algorithm Running Your Distributed Cache

Your distributed cache has 10 nodes. You add an 11th. With naive hashing, 90% of your cached data is now on the wrong node.

The function h(key) mod 10 and h(key) mod 11 produce different results for most keys. Every misrouted request bypasses cache and hits the database. At scale, this is not a cache miss. It is a self-inflicted denial of service.

The Concept

Consistent hashing places both nodes and keys on the same circular space.

Naive hashing:                    Consistent hashing:

h(key) mod N                      keys and nodes share a ring (0 to 2^32)

Add 1 node: N changes             Add 1 node: only adjacent keys migrate
~90% of keys misrouted            ~1/N keys migrate

              key ──→ nearest node clockwise
              ┌──────────────┐
         N=11 │              │ N=3
              │     ring     │
         N=8  │              │ N=5
              └──────────────┘
                    N=7

Hash the node identifiers onto a ring of positions 0 to 2^32. Hash each key onto the same ring. Walk clockwise from the key's position. The first node you hit owns that key.

Adding a node only affects keys between the new node and its predecessor on the ring. Remove a node and its keys migrate to the next clockwise neighbour. One-Nth of keys move instead of nearly all of them.

The insight is structural. Naive hashing couples every key's location to the total node count. Consistent hashing decouples them. Each key's position depends only on the key itself.

Production Mechanics

A bare consistent hashing ring creates uneven load. Nodes land at arbitrary positions. Some nodes own large arcs of the ring. Others own small arcs. One node might handle 40% of traffic while another handles 5%.

Virtual nodes fix this. Each physical node claims 100 to 200 positions on the ring. The positions spread across the full circle. Load distributes proportionally. When a physical node fails, its virtual positions scatter their keys across many other nodes instead of dumping everything onto one neighbour.

The quorum formula governs consistency. Set N replicas for each key. Require W successful writes before acknowledging. Require R successful reads before returning. When R + W > N, at least one replica in every read set overlaps with the write set. This guarantees the read returns the latest write.

Dynamo, Cassandra, and Riak all implement this design. The partition count is fixed at ring creation. The replication factor and quorum settings are tunable per operation. Read-heavy workloads use R=1, W=N. Write-heavy workloads use W=1, R=N.

The Failure Mode

Without virtual nodes, three physical nodes might cluster on one side of the ring. One node owns 60% of the key space. That node's CPU saturates while others idle. The system has capacity. It cannot use it.

Virtual nodes reduce hotspot probability but do not eliminate it. Skewed key distributions — a celebrity user, a viral post — concentrate requests on whichever virtual node owns that key range. Monitoring per-virtual-node request rates is the only way to detect this before latency spikes.

The Tradeoff

A single database with a hash index is simpler. One machine, one hash table, no ring, no quorums. It works until it does not.

The consistent hashing ring eliminates the central coordinator. Any node can route any request by computing the hash locally. No master node. No routing table to synchronise. The cost is operational complexity: rebalancing, quorum tuning, and replica divergence during network partitions.

The Signal

You are distributing data or load across multiple nodes. You need to add or remove nodes without migrating most of the data. You need each node to route requests independently without a central lookup. Consistent hashing applies.


Concept: Consistent hashing

Tradeoff: AT5 — the ring eliminates the coordinator but adds quorum management and replica divergence during partitions

Failure Mode: FM6 — without virtual nodes, uneven ring placement concentrates load on a single physical node

Signal: You need to add or remove nodes from a cluster without rehashing most keys

Series: Book 2, Ch 15

*The full framework treatment — compression blocks, three-level exercises, and the complete AT/FM mapping — is in Book 2, Ch 15. Free chapters available at [computingseries.com](https://computingseries.com).*