The Computing Series

What Goes Wrong

The most common failure is starting at the symptom instead of the archetype. Engineers who trace the stack from the error log are using depth-first traversal when the failure is architectural. They find technically correct information — the call stack, the failing service, the log message — that does not explain what produced the failure.

FM11 (Observability Blindness) is both a cause and an effect here. Systems without good observability are harder to read under pressure. But it is also an output of reading failures: engineers who do not understand their systems cannot instrument them correctly. The missing metric is almost always the one the system builder did not think would matter.

When frameworks disagree — the archetype suggests a caching layer should be present, but the review questions reveal there is none — the disagreement is information. It means either the archetype is wrong (the system is not what it appears to be) or a known best practice was consciously skipped (and there is a reason worth finding). Both possibilities change the failure model.

Concept: Reading a System You Didn’t Build Thread: T12 (Tradeoffs) ← reading breadth-first vs depth-first → right traversal for the failure type Core Idea: Archetype identification before code reading; the seven review questions in order; failure mode enumeration from the architecture, not the log. Tradeoff: AT9 — completeness of system model vs speed of first useful hypothesis Failure Mode: FM11 — observability blindness; missing the metric that would have made the failure obvious Signal: When an on-call engineer cannot form a hypothesis within ten minutes — they are depth-first on symptoms; redirect to archetype and the first three review questions Maps to: Reference Book, Frameworks 5, 6, 7

Read in the book →