Introduction

In 2009, Twitter’s database was a single MySQL instance. On election night that year, it fell over. The engineering team watched the site go down as millions of users posted simultaneously — a workload the single database could not absorb. They called it the “fail whale” era. The solution was not a faster database. It was cutting the database into pieces and spreading those pieces across machines.

Partitioning — dividing a single large dataset into smaller, independently stored and queried subsets — is the architectural act that makes databases scale horizontally. Without it, every byte of data must eventually fit on a single machine, and every query must go through a single CPU. With it, data is distributed across dozens or hundreds of machines, and queries can be answered in parallel.

The problem, as Twitter learned, is that partitioning is not free. Every design decision about how to split the data introduces a different set of failure modes and operational constraints. Choosing the wrong scheme does not just reduce performance — it moves the bottleneck from one place to another, often invisibly, until production traffic reveals it catastrophically.

The three dominant partitioning strategies, the hot shard problem, and the rebalancing problem that appears whenever a cluster grows each introduce a different failure surface.

Read in the book →

← Sharding and Partitioning Thread Activation →