Exercises

Level 1 — Understand

What property of a hash function makes Git’s content-addressable storage work? What would break if two different files could produce the same hash?
Explain the difference between modular hashing (hash(key) % N) and consistent hashing. In which scenario does modular hashing break, and what is the consequence?
What problem do virtual nodes solve in a consistent hash ring? What is the cost of using more virtual nodes?

Level 2 — Apply

A consistent hash ring has three nodes: A, B, C. Their positions on the ring (0–99) are: A=10, B=45, C=72.

Which node handles keys that hash to: 5, 30, 60, 80?
Node B is removed. Which keys migrate, and to which node?
Node D is added at position 55. Which keys migrate, and to which node?

Level 3 — Design

You are designing the routing layer for a distributed session store. The session store has 10 nodes. Sessions must be sticky — a user’s session must always be routed to the same node for the duration of the session. The node pool changes: nodes fail ~2 times per month and are replaced; capacity scales up by adding 2–3 nodes per quarter.

Design the routing strategy. Choose between modular hashing and consistent hashing. Name the tradeoff explicitly.
How many virtual nodes per physical node would you configure? What determines this number?
When a node fails mid-session, some sessions are lost. Propose a strategy to reduce session loss on node failure without breaking the consistent hashing model. Name every tradeoff you introduce.

A complete answer will: (1) correctly choose consistent hashing over modular hashing with quantified reasoning about session disruption on node change (O(1/n) vs O(n) sessions remapped), (2) identify FM1 (single point of failure — the failed node) and FM12 (network partition) as the primary risks under node failure, (3) address the AT5 tradeoff between the number of virtual nodes (higher = better load balance, more memory overhead) and stability, and (4) propose a session replication strategy (e.g., write sessions to the successor node as well) that reduces loss without a full re-route of all sessions.

Read in the book →

← Real Systems