The Computing Series

The Concept

Partitioning (also called sharding) divides a dataset into disjoint subsets, each stored on a separate node. Every record belongs to exactly one partition. A routing layer determines which partition owns a given record.

Three strategies dominate production systems:

Strategy Mechanism Strengths Weaknesses
Range partitioning Split by key range (A–M on shard 1, N–Z on shard 2) Range queries are efficient; data locality Sequential writes create hot shards
Hash partitioning shard = hash(key) % N Even distribution of random keys No range queries; rebalancing is O(N)
Consistent hashing Keys and nodes on a ring; key goes to next clockwise node O(1/N) keys move when a node joins Requires virtual nodes for balance

The routing map — the data structure that tells a client which partition owns a given key — is itself a scaling concern. It can be stored centrally (a single configuration server) or distributed (embedded in the client library). Centralised routing is simpler but is a single point of failure. Distributed routing is resilient but requires clients to handle routing table updates.


Read in the book →