The Computing Series

Failure Modes in This System

FM12 — Split-Brain: When a network partition separates nodes into two groups, each group may accept writes to the same key. When the partition heals, both versions exist. Split-brain is not a failure that can be fully prevented — it is an inherent consequence of the Consistency/Availability tradeoff. The mitigation is detecting it (vector clocks surface conflicts) and resolving it deterministically.

FM6 — Hotspotting: Virtual nodes mitigate but do not eliminate hotspotting. A key accessed by millions of clients simultaneously — a trending topic, a celebrity’s profile — concentrates load on the nodes that own that key regardless of ring distribution. Application-level caching or request deduplication is the mitigation.

FM4 — Data Consistency Failure: Eventual consistency means reads can return stale data. If an application assumes consistency where the storage layer provides only eventual consistency, it will produce incorrect results. Every eventual consistency deployment requires explicit staleness bounds and client-side handling of stale reads.

Read in the book →