The Concept

Partitioning (also called sharding) divides a dataset into disjoint subsets, each stored on a separate node. Every record belongs to exactly one partition. A routing layer determines which partition owns a given record.

Three strategies dominate production systems:

Strategy	Mechanism	Strengths	Weaknesses
Range partitioning	Split by key range (A–M on shard 1, N–Z on shard 2)	Range queries are efficient; data locality	Sequential writes create hot shards
Hash partitioning	`shard = hash(key) % N`	Even distribution of random keys	No range queries; rebalancing is O(N)
Consistent hashing	Keys and nodes on a ring; key goes to next clockwise node	O(1/N) keys move when a node joins	Requires virtual nodes for balance

The routing map — the data structure that tells a client which partition owns a given key — is itself a scaling concern. It can be stored centrally (a single configuration server) or distributed (embedded in the client library). Centralised routing is simpler but is a single point of failure. Distributed routing is resilient but requires clients to handle routing table updates.

Read in the book →

← Thread Activation How It Works →