What Goes Wrong

The most common failure is starting at the symptom instead of the archetype. Engineers who trace the stack from the error log are using depth-first traversal when the failure is architectural. They find technically correct information — the call stack, the failing service, the log message — that does not explain what produced the failure.

FM11 (Observability Blindness) is both a cause and an effect here. Systems without good observability are harder to read under pressure. But it is also an output of reading failures: engineers who do not understand their systems cannot instrument them correctly. The missing metric is almost always the one the system builder did not think would matter.

When frameworks disagree — the archetype suggests a caching layer should be present, but the review questions reveal there is none — the disagreement is information. It means either the archetype is wrong (the system is not what it appears to be) or a known best practice was consciously skipped (and there is a reason worth finding). Both possibilities change the failure model.

Concept: Reading a System You Didn’t Build Thread: T12 (Tradeoffs) ← reading breadth-first vs depth-first → right traversal for the failure type Core Idea: Archetype identification before code reading; the seven review questions in order; failure mode enumeration from the architecture, not the log. Tradeoff: AT9 — completeness of system model vs speed of first useful hypothesis Failure Mode: FM11 — observability blindness; missing the metric that would have made the failure obvious Signal: When an on-call engineer cannot form a hypothesis within ten minutes — they are depth-first on symptoms; redirect to archetype and the first three review questions Maps to: Reference Book, Frameworks 5, 6, 7

Read in the book →

← What Great CTOs Do Reflection Questions →