The Computing Series

Naive Approach and Why It Fails

The naive approach is a hash table: node = hash(key) % N where N is the number of nodes. This works until a node is added or removed. When N changes, almost every key maps to a different node. All cached data is invalid simultaneously. All clients must be updated. The system experiences a thundering herd as all clients reload from the database.

This is the modulo problem: simple hash-based distribution is not resilient to topology changes. At scale, topology changes are routine — nodes fail, capacity is added, hardware is replaced. An algorithm that invalidates the entire key space on every topology change is unusable.

Read in the book →