How It Works

Range Partitioning

Range partitioning assigns contiguous ranges of the key space to each shard. A routing table maps ranges to nodes.

// Range partition routing table
routing_table = [
    {min: "A", max: "M", node: shard_1},
    {min: "N", max: "Z", node: shard_2},
]

function route(key):
    for range in routing_table:
        if range.min <= key <= range.max:
            return range.node
    raise RoutingError("key out of range")

// Range scan: efficient — all keys in a range live on the same shard
function range_scan(start_key, end_key):
    affected_shards = routing_table.filter(overlaps(start_key, end_key))
    results = []
    for shard in affected_shards:
        results += shard.scan(start_key, end_key)
    return merge_sorted(results)

The problem: if keys are generated sequentially (timestamps, auto-increment IDs), all writes go to the shard that owns the current key range. That shard receives 100% of write traffic while the others sit idle.

Hash Partitioning

Hash partitioning maps each key to a shard by applying a hash function.

num_shards = 4

function route(key):
    shard_index = hash(key) % num_shards
    return shards[shard_index]

// Write: distributed evenly for random keys
function write(key, value):
    shard = route(key)
    shard.put(key, value)

// Point lookup: O(1) — hash the key, go directly to the shard
function read(key):
    shard = route(key)
    return shard.get(key)

// Range scan: NOT possible without scanning all shards
// Adjacent keys hash to arbitrary shards — there is no locality
function range_scan(start_key, end_key):
    results = []
    for shard in shards:                // must scan ALL shards
        results += shard.scan_range(start_key, end_key)
    return merge_sorted(results)

The rebalancing problem: when num_shards changes from 4 to 5, the formula hash(key) % num_shards gives a different result for almost every key. Nearly all data must be moved.

Consistent Hashing

Consistent hashing places both keys and nodes on a circular ring (hash space 0 to 2^32). A key is assigned to the first node clockwise from its hash position.

// Ring with virtual nodes for balance
ring = SortedMap()   // maps hash positions to node identifiers

function add_node(node, virtual_node_count=150):
    for i in 0..virtual_node_count:
        position = hash(node.id + ":" + i)
        ring[position] = node

function route(key):
    key_position = hash(key)
    // Find the first node clockwise from key_position
    node_position = ring.find_first_ge(key_position)
    if node_position is None:
        node_position = ring.first()   // wrap around
    return ring[node_position]

// When a new node joins: only the keys between
// (new_node - 1 predecessor) and new_node move.
// Expected fraction of keys that move = 1 / (num_nodes + 1)
function add_node_and_rebalance(new_node):
    add_node(new_node)
    predecessor = ring.predecessor(new_node.position)
    // Only keys in (predecessor, new_node] need to move
    migrate_keys(from: next_node_after(new_node),
                 to: new_node,
                 key_range: (predecessor.position, new_node.position])

When a node joins, only 1/N of keys move on average — compared to near-total movement for modulo hashing. Cassandra, Amazon DynamoDB, and Riak use consistent hashing.

The Hot Shard Problem

A hot shard occurs when one partition receives disproportionate traffic — reads, writes, or both.

// Scenario: celebrity user with 50 million followers
// All reads for that user's feed route to the same shard
function get_feed(user_id):
    shard = route(user_id)   // always shard_3 for user "celebrity_A"
    return shard.read_feed(user_id)

// Shard 3 receives millions of requests/second
// Other shards sit at 5% utilisation

// Mitigation 1: key splitting — append a random suffix
function write_celebrity_data(user_id, data):
    suffix = random_int(0, 9)
    shard_key = user_id + ":" + suffix   // spreads across 10 shards
    route(shard_key).write(data)

// Mitigation 2: read replicas for hot keys
function read_hot_key(key):
    replicas = hot_key_registry.get_replicas(key)
    chosen = replicas[random_int(0, len(replicas) - 1)]
    return chosen.read(key)

Read in the book →

← The Concept Tradeoffs →