F1 — The 12 Mental Models

Why This Framework Exists

Two engineers look at a slow API endpoint. The first thinks: the database query is unoptimised. The second thinks: the data this endpoint returns is recomputed on every call and never cached. Both look at the same system. They see different problems because they are using different mental models.

Mental models are not just thinking styles. They are the patterns your attention defaults to — what you notice first, what questions you ask, what solutions you reach for. An engineer who sees systems primarily through the lens of Flow will optimise for throughput. One who defaults to the State lens will look for consistency problems. Neither is wrong. Both are incomplete alone.

The twelve mental models are not a complete description of a system. They are twelve different lenses. Apply all twelve to any system and you will see things each individual lens misses.

You have been using mental models since you wrote your first program. What follows is not new material — it is naming and structuring something you are already doing, so you can do it deliberately.

How to Use This Chapter

Each mental model is presented at three levels. Layer 1 is the recall trigger — the name and a one-line definition. Layer 2 is the reconstruction paragraph — enough to rebuild the concept. Layer 3 is the full discussion — examples, applications, and failure modes.

When revising: read Layer 1. If you can reconstruct the concept from that alone, move on. If not, read Layer 2.

The 12 Mental Models

#	Model	Layer 1 — Recall Trigger
MM1	Transformation	Computation is input → function → output
MM2	Search	Many problems are finding the best state in a large space
MM3	Optimisation	Find the best solution under constraints
MM4	Flow	Systems are defined by how data moves through them
MM5	State	Systems remember things; state management is the hardest problem in distributed systems
MM6	Networks	Relationships between things are first-class objects
MM7	Feedback	Systems react to their own output
MM8	Concurrency	Multiple actors operating simultaneously on shared state
MM9	Redundancy	Copies protect against loss
MM10	Layered Abstraction	Reason at any level of the stack without knowing all the levels beneath
MM11	Indirection	Any problem in computing can be solved by adding a level of indirection
MM12	Tradeoffs	Every decision is a tradeoff as certain outcomes are mutually exclusive

Each Mental Model in Detail

MM1 — Transformation

Layer 1: Computation is input → function → output. What does this system transform, from what, into what?

Layer 2: Every function, every API endpoint, every service converts inputs into outputs. This model asks you to identify the transformation explicitly: what is the input type, what is the output type, and what is the function’s contract? Systems that are unclear about their transformation contract are the ones with the most surprising bugs.

Layer 3: The transformation model is the entry point for understanding any component. Before asking how something works, ask what it does. A payment service transforms a payment request into a payment confirmation and a database mutation. A recommendation engine transforms a user identifier into a ranked list of items. A load balancer transforms an inbound HTTP request into a forwarded HTTP request on a different socket.

Where it matters: Interface design. Testing — if you can state the transformation, you can write tests. Debugging — if the output is wrong, trace the inputs through the transformation.

Where it breaks: Systems with side effects that are larger than their declared transformation. A function that “returns a user” but also silently creates an audit log, sends an email, and updates a counter is doing more than its transformation contract says. This is how hidden coupling is created.

MM2 — Search

Layer 1: Many problems are finding the best state in a large space. What state are you looking for, and how do you navigate the space?

Layer 2: Search is not just about search boxes. Any algorithm that explores a set of possibilities to find one that meets a criterion is a search. Route finding, recommendation, constraint satisfaction, and query optimisation are all search problems. The structure of the search space determines which algorithm works — exhaustive, greedy, heuristic, branch-and-bound.

Layer 3: Database query optimisation is search: the query planner searches the space of execution plans to find the one with the lowest estimated cost. A recommendation engine searches the space of items to find the ones most likely to be engaged with. A compiler searches the space of register allocations to find one that minimises spills.

Where it matters: Any time performance depends on reducing the search space — indexes, caches, filters, pruning strategies.

Where it breaks: When the search space is larger than expected. An O(n²) algorithm that was fine at n=1000 becomes unusable at n=100,000. The search model asks: how does the search space grow with data volume?

MM3 — Optimisation

Layer 1: Find the best solution under constraints. Every architectural decision is optimisation: best performance subject to cost, latency, consistency, and complexity constraints.

Layer 2: Every system makes trade-offs — latency for throughput, consistency for availability, simplicity for flexibility. These are optimisation problems with an objective function (what you are maximising) and constraints (what you cannot violate). Making these explicit converts an argument about preferences into a resolvable technical question.

Layer 3: Kubernetes bin-packing is literally an optimisation problem — pack the maximum number of pods onto the minimum number of nodes subject to resource constraints. The CAP theorem is an optimisation impossibility result — you cannot maximise all three simultaneously during a partition. Every SLO is an objective function: maximise feature velocity subject to error rate ≤ 0.1%.

Where it matters: Architecture decisions, capacity planning, resource allocation, performance tuning.

Where it breaks: When the constraints are wrong — you optimise for the stated constraint (e.g., P50 latency) while the real constraint was unstated (e.g., P99.9 latency during peak). Goodhart’s Law is this failure mode.

MM4 — Flow

Layer 1: Systems are defined by how data moves through them. Map the data flow and you understand the system’s bottlenecks, latency profile, and throughput limits.

Layer 2: Every request is data that flows: through a network, into a load balancer, to an application server, to a database, back to the application, back through the network. The latency of the system is the sum of the latencies of each hop. The throughput is constrained by the bottleneck stage. Little’s Law captures this: throughput equals arrival rate times average latency.

Layer 3: The flow model is how you read an architecture diagram correctly. Trace the path of a request from entry to exit. At each hop: what is the latency? What is the queue depth? What happens when this component is slow? Where is the bottleneck? The answers to these questions design the optimisation priority.

Where it matters: Performance analysis, capacity planning, identifying the critical path for latency reduction.

Where it breaks: When flows branch and merge in non-obvious ways. Fan-out patterns (one request spawning multiple downstream calls) are flow problems that look like single requests but are secretly parallel flows, each with their own latency distribution.

MM5 — State

Layer 1: Systems remember things. State is what a system knows at a point in time. State management is the hardest problem in distributed systems.

Layer 2: Stateless systems are easy to scale — add instances. Stateful systems are hard — instances must coordinate or route consistently. Every distributed systems problem — consistency, consensus, eventual convergence, split-brain — is a state problem. The state model asks: what does this component remember, where is that memory stored, and who else can change it?

Layer 3: HTTP is stateless by design — each request is independent. Sessions are state added on top of HTTP. Distributed caches are shared state. Databases are durable state. The hard problems emerge when multiple components have overlapping views of the same state: the cache holds one value, the database holds another. Who is correct? This is the fundamental tension between Thread T5 (Caching) and Thread T7 (State Machines).

Where it matters: Any horizontal scaling decision, any consistency requirement, any session or authentication design.

Where it breaks: When components accumulate state they were not designed to own. Stateful application servers that cannot be restarted without losing session data are the canonical example.

MM6 — Networks

Layer 1: Relationships between things are first-class objects. Model the nodes and the edges, not just the nodes.

Layer 2: The network model applies whenever the interesting behaviour comes from relationships rather than individual components. Social graphs, dependency graphs, communication graphs, network topologies — all are problems where the edges carry the information. Ignoring the edges means missing the most important part of the system.

Layer 3: Conway’s Law is a network model observation: the communication graph of a software team determines the dependency graph of the software it produces. This is why re-architecting without re-organising does not work — you have changed the network topology of the code but left the network topology of the teams unchanged.

Where it matters: Microservice dependency management, social features, network-effects analysis, organisational design.

Where it breaks: When the network becomes cyclic. Cyclic dependencies in code, circular team ownership, circular service dependencies — all create systems that cannot be decomposed or tested independently.

MM7 — Feedback

Layer 1: Systems react to their own output. A feedback loop where output reduces input is stable. A loop where output increases input is unstable.

Layer 2: Rate limiters, circuit breakers, auto-scaling policies, surge pricing, and monitoring alerts are all feedback loops. The behaviour of the loop — whether it stabilises, oscillates, or diverges — is determined by the loop’s gain and delay. High gain with high delay produces oscillation. This is why auto-scaling policies with aggressive thresholds and slow reaction times produce thrashing.

Layer 3: The thundering herd problem is a feedback loop gone wrong: cache expires → all clients miss → all clients hit database → database slows → more requests queue → more timeouts → more retries → database under more load. Each loop iteration amplifies the problem. Prevention (jittered TTLs, mutex on miss) adds negative feedback to break the loop.

Where it matters: Any control system: rate limiting, circuit breaking, auto-scaling, pricing algorithms, recommendation engine training.

Where it breaks: Optimised metrics instead of outcomes. Goodhart’s Law: the feedback loop measures the right thing, the team optimises it, it stops measuring the right thing.

MM8 — Concurrency

Layer 1: Multiple actors operating simultaneously on shared state. Every race condition, deadlock, and data corruption is a concurrency problem.

Layer 2: Concurrency problems are not bugs — they are emergent behaviour from the combination of parallel execution and shared state. Remove either parallelism or sharing and the problem disappears. Functional programming removes sharing (immutability). The actor model removes sharing (private mailboxes). Distributed transactions are attempts to make shared state safe under concurrent modification.

Layer 3: The two-phase locking protocol exists to prevent concurrent transactions from creating inconsistencies. The database isolation levels (READ UNCOMMITTED, READ COMMITTED, REPEATABLE READ, SERIALIZABLE) are choices on the concurrency/performance tradeoff axis. The higher the isolation, the fewer concurrency anomalies, the lower the throughput.

Where it matters: Any time multiple requests can modify the same data: inventory counts, account balances, seat reservations, shared documents.

Where it breaks: When locking is added reactively rather than by design. Systems that were originally single-threaded and later made concurrent typically have locking that is either too coarse (bottleneck) or too fine (deadlock risk).

MM9 — Redundancy

Layer 1: Copies protect against loss. Every SPOF in a system is a place where redundancy was not applied.

Layer 2: Hardware fails. The question is not whether a component will fail but how the system behaves when it does. Redundancy converts individual component failure from a system failure into a tolerated event. The cost is complexity — multiple copies require consistency (Thread T9) to agree on which copy is current.

Layer 3: Redundancy has a cost that is often underestimated. N+1 redundancy means one extra copy. If the primary fails, the replica must handle the full load — can it? Active-active redundancy means both copies serve traffic simultaneously — do they stay consistent? Geographic redundancy means copies in different datacentres — what happens when the network between them partitions?

Where it matters: Any component on the critical path to availability. Any data that cannot be reconstructed.

Where it breaks: When redundancy is added to the wrong components. Replicating a database across three nodes does not help if the load balancer in front of them is a single point of failure.

MM10 — Layered Abstraction

Layer 1: Reason at any level of the stack without knowing all the levels beneath. Abstractions let you move fast. Leaky abstractions slow you down.

Layer 2: TCP is a reliable byte stream. You do not need to know about IP fragmentation to write a web server. HTTP is a request-response protocol. You do not need to know about TCP to design an API. Each layer provides an abstraction that hides the complexity below. The Law of Leaky Abstractions (F9 #13) says these abstractions always eventually expose implementation details — but a good abstraction delays that moment as long as possible.

Layer 3: The entire Knowledge Stack (Chapter 2) is a layered abstraction model. You can design a payment service (Layer 4) without knowing the details of consistent hashing (Layer 3). You can optimise a database query without understanding B-tree page splits. The knowledge of lower layers is not needed for routine work — it is needed when abstractions leak.

Where it matters: Framework and library design, API design, microservice interface design.

Where it breaks: When abstractions leak and engineers do not know the layer below. A developer who does not know that HTTP/1.1 uses persistent connections by default will be confused by why a connection pool behaves the way it does under load.

MM11 — Indirection

Layer 1: Any problem in computing can be solved by adding a level of indirection. Every proxy, load balancer, DNS record, and virtual address is indirection in practice.

Layer 2: Indirection decouples the caller from the callee. If the caller holds a direct reference, changing the callee requires changing the caller. If the caller holds an indirect reference (a name, an address, an interface), the callee can be swapped out without the caller knowing. This is why service discovery, load balancers, and dependency injection all exist.

Layer 3: DNS is indirection: you hold a hostname, not an IP address. When the server changes IP, no client code changes. A load balancer is indirection: clients connect to the VIP, not to any specific server. Dependency injection is indirection: components receive their dependencies rather than constructing them, enabling swapping for tests.

Where it matters: Any place you need to decouple two components for independent deployment, independent testing, or future change.

Where it breaks: When indirection adds latency or complexity without corresponding benefit. Every additional hop has a cost. Every additional indirection layer is a debugging complexity. The rule: add indirection when the decoupling value exceeds the latency and complexity cost.

MM12 — Tradeoffs

Layer 1: Every decision is a tradeoff. It is the acknowledgement that certain outcomes are mutually exclusive. There is no universally correct architecture, only choices that are better for specific requirements.

Layer 2: This mental model is meta — it governs how you apply the other eleven. The correct lens to apply to a problem is itself a tradeoff decision. When to use the Redundancy model vs the State model, when to reach for the Flow model vs the Feedback model — these are choices based on what the situation demands. Fluency with the tradeoffs framework (F4) is what separates engineers who see the right tradeoffs from those who see the comfortable ones.

Layer 3: The most common engineering mistake is not making a wrong tradeoff — it is not naming the tradeoff at all. A system that chooses eventual consistency without the team knowing they made that choice will produce a data inconsistency bug that nobody can explain. A caching strategy that trades consistency for performance will cause stale data issues that appear random. Name the tradeoff, document it, and the system becomes predictable.

Where it matters: Every architectural decision. Every design review. Every technology selection.

Where it breaks: When the tradeoff is named but not revisited. Tradeoffs made at 10K users are not always valid at 10M users. Requirements change. The tradeoffs must be reviewed with them.

How Items Connect — Mental Model Interactions

The twelve models are not independent. Several pairs create productive tension:

MM5 (State) constrains MM9 (Redundancy): Adding redundant copies creates the question of which copy is current. Redundancy is never free — it converts a single-component problem into a consistency problem.

MM7 (Feedback) amplifies MM8 (Concurrency): Under concurrency, feedback loops can amplify problems faster than any single-threaded system. The thundering herd and retry storms are feedback effects that concurrency makes worse.

MM11 (Indirection) enables MM10 (Layered Abstraction): Every abstraction layer is built from indirection — a stable interface that hides the implementation below. The two models are inseparable in practice.

MM12 (Tradeoffs) governs all others: Which lens to apply is itself a tradeoff decision. Fluency is not knowing all twelve individually — it is knowing when to switch between them.

Apply the models in sets: diagnosing a slow system needs MM4 (Flow) + MM3 (Optimisation) + MM7 (Feedback). Diagnosing a consistency failure needs MM5 (State) + MM8 (Concurrency) + MM9 (Redundancy). No single model covers the whole system.

In Practice: Reading a Cache Through Four Lenses

A distributed cache sits between the application tier and the database.

Flow (MM4): Read requests flow to the cache first. Cache hits return in microseconds. Cache misses add a full database round-trip. The cache’s hit rate determines the effective latency distribution of all read requests.

State (MM5): The cache holds state. That state can become stale when the database is updated. The TTL is the policy for how long stale state is tolerated.

Feedback (MM7): When the cache is healthy, read load on the database is low. When the cache is overloaded or fails, read load on the database spikes. The database degrades. The cache misses increase further. The feedback loop is positive — amplifying the problem.

Redundancy (MM9): A single-node cache is a SPOF for read performance. A clustered cache is redundant — but now requires consistent hashing (T1) to route requests to the right node, and introduces the question of what happens when a node fails and its keys must be served cold.

Four lenses. Four different problems. All present in the same component. This is why fluency with the mental models matters — any one of them alone misses something.

Self-Assessment

Name all 12 mental models without looking. For any you cannot recall, state the type of system problem that model is designed to surface.
A service is slow at P99 but fast at P50. Which three mental models would you apply first to diagnose it? What specific question does each model ask that the others cannot?
Explain the interaction between MM9 (Redundancy) and MM5 (State). Give a concrete example where adding redundancy makes state management harder, and describe how you would resolve the tension.
Two engineers are designing a caching layer. One wants a larger cache with simple eviction; the other wants a smaller cache with a sophisticated eviction policy. Which mental models is each engineer primarily applying? What information would make the choice unambiguous?
A new hire has spent three years writing backend services and knows algorithms well. Without seeing their code, predict which mental models they are likely to apply fluently and which they are likely to underuse. How would this show up in a code review?

Read in the book →