Partitioning (also called sharding) divides a dataset into disjoint subsets, each stored on a separate node. Every record belongs to exactly one partition. A routing layer determines which partition owns a given record.
Three strategies dominate production systems:
| Strategy | Mechanism | Strengths | Weaknesses |
|---|---|---|---|
| Range partitioning | Split by key range (A–M on shard 1, N–Z on shard 2) | Range queries are efficient; data locality | Sequential writes create hot shards |
| Hash partitioning | shard = hash(key) % N |
Even distribution of random keys | No range queries; rebalancing is O(N) |
| Consistent hashing | Keys and nodes on a ring; key goes to next clockwise node | O(1/N) keys move when a node joins | Requires virtual nodes for balance |
The routing map — the data structure that tells a client which partition owns a given key — is itself a scaling concern. It can be stored centrally (a single configuration server) or distributed (embedded in the client library). Centralised routing is simpler but is a single point of failure. Distributed routing is resilient but requires clients to handle routing table updates.