Where It Fails

FM6 — Hotspotting: Range partitioning on a monotonically increasing key (a timestamp, an auto-increment ID) sends all new writes to the partition that owns the current key range. The hot shard receives 100% of write traffic. It exhausts CPU and I/O. Other shards sit idle. The system appears to have plenty of capacity in aggregate while one partition is saturated. Observability makes this worse: aggregate metrics (average CPU, average query latency) look healthy because they average across hot and cold shards. Only per-shard metrics reveal the problem.

FM3 — Unbounded Resource Consumption during Rebalancing: When a new node joins a hash-partitioned cluster, the modulo shift forces nearly all data to move simultaneously. The rebalancing traffic competes with live production traffic for network bandwidth, disk I/O, and CPU. On a large cluster, rebalancing can last hours. During that window, a second node failure — likely, given that the cluster is stressed — leaves the cluster in an inconsistent state. Consistent hashing reduces the volume of data movement to 1/N of total data, but even 1/N of a petabyte-scale dataset is substantial. Rebalancing must be rate-limited and throttled, and the cluster must continue serving traffic throughout.

Read in the book →

← Tradeoffs Real Systems →