The Computing Series

F3 — The 12 Failure Modes

# Failure Mode Recall Trigger
FM1 Single Point of Failure One component down = system down
FM2 Cascading Failures One failure triggers the next
FM3 Unbounded Resource Consumption Memory / connections / threads exhausted
FM4 Data Consistency Failure Systems disagree on the state of the world
FM5 Latency Amplification Small latencies × many hops = large total
FM6 Hotspotting One node gets all the traffic
FM7 Thundering Herd Many clients retry simultaneously, overwhelming recovery
FM8 Schema / Contract Violation One side changes; the other side breaks
FM9 Silent Data Corruption Incorrect data propagates without alerts
FM10 Security Breach Unauthorised access
FM11 Observability Blindness System failing; team cannot see why
FM12 Split-Brain Two nodes each think they are primary

Use: Pre-mortem — run this list against each component. Post-mortem — name the failure mode first; the prevention follows.


Read in the book →