A load balancer routes a request to a server. A session cache returns a user’s data. A dependency resolver decides which packages to install and in what order.
Each of these is a system you have used, operated, or built. But there is a different way to look at them — one that makes you faster at debugging, better at capacity planning, and clearer in design reviews. The question is not “how does this system work?” The question is: “which algorithm is this system running, and what does that tell me about its performance characteristics?”
Every production system is built from algorithms you have already studied. The load balancer is running a hash function. The session cache is running a key-value lookup. The dependency resolver is running a topological sort. The translation layer between the algorithm and the system you operate is what this book provides.
The thread running through Book 2 is T12: Tradeoffs — every translation from algorithm to system is a tradeoff decision about what to optimise for and what to sacrifice.
This is the bridge between Book 1 and everything that follows. The methodology you build in this chapter is the lens you will apply in every subsequent chapter.
The claim is simple: every production system has an algorithmic core. Find that core, and you understand the system’s performance behaviour at a level no operations runbook can give you.
The algorithmic core is not hidden. It is visible to anyone who knows where to look. The method for finding it has four steps:
This is not reverse engineering. It is pattern recognition. The same patterns appear repeatedly because engineering problems have structure, and the algorithms that solve them have been developed and refined over decades.
A load balancer sits in front of a pool of servers. For every incoming request, it must choose which server to send it to. The choice matters: an unbalanced distribution creates hotspots (FM6); a distribution that ignores server health wastes capacity.
Find the bottleneck: the routing decision — made millions of times per second.
Identify the data structure: the server pool. A list of N servers, each with state (current load, health).
Identify the algorithm: key-to-server mapping. There are several options: - Round robin: iterate the list, assign each request to the next server. O(1) per decision. - Least connections: find the server with fewest active connections. O(N) per decision if naive, O(log N) with a min-heap. - Hash-based: hash the request key (source IP, session ID) to a server index. O(1) per decision, sticky routing.
Name the complexity: O(1) for round-robin and hash-based; O(log N) for heap-based least connections.
What does this tell you? A round-robin balancer will not degrade as the server pool grows. A naive least-connections balancer will. If your load balancer starts showing elevated routing latency at 100 servers that it did not show at 10 servers, the cause is identifiable — and the implementation detail to investigate becomes clear.
# Round-robin: O(1) per call, ignores server state
class RoundRobinBalancer:
def __init__(self, servers: list[str]):
self.servers = servers
self.index = 0
def next_server(self) -> str:
server = self.servers[self.index % len(self.servers)]
self.index += 1
return server
# Least-connections: O(log N) per call, uses a min-heap on connection count
import heapq
class LeastConnectionsBalancer:
def __init__(self, servers: list[str]):
# heap entries: (connection_count, server_name)
self.heap = [(0, s) for s in servers]
heapq.heapify(self.heap)
def acquire(self) -> str:
count, server = heapq.heappop(self.heap)
heapq.heappush(self.heap, (count + 1, server))
return serverThe round-robin balancer is O(1). The heap-based balancer is O(log N). This matters when N is large and routing decisions are on the critical path of every request.
A session cache stores user session data — authentication tokens, preferences, temporary state. The system is a cache layer in front of a database. Its job: make reads fast by avoiding a database round-trip for recently accessed data.
Find the bottleneck: the lookup operation. Every authenticated request requires a session lookup.
Identify the data structure: a key-value map. Session ID → session data.
Identify the algorithm: hash-based lookup. The session ID is the key; the hash map makes lookup O(1).
Name the complexity: O(1) average for lookup, O(N) for full scan (which a session cache should never need to do).
What does this tell you? Session cache latency should be flat regardless of how many sessions are stored. If you observe cache latency increasing with session count, the cache is not using O(1) lookup — something is wrong. This becomes diagnosable once the algorithm under investigation is identified.
# O(1) lookup — this is the invariant the session cache must preserve
class SessionCache:
def __init__(self):
self._store: dict[str, dict] = {}
def get(self, session_id: str) -> dict | None:
return self._store.get(session_id) # O(1) average
def set(self, session_id: str, data: dict) -> None:
self._store[session_id] = data # O(1) average
def delete(self, session_id: str) -> None:
self._store.pop(session_id, None) # O(1) averageA package manager (pip, npm, maven) must determine which packages to install and in what order. Package A depends on B and C. B depends on D. The installer must ensure D is installed before B, and B before A.
Find the bottleneck: determining installation order at resolve time.
Identify the data structure: a directed acyclic graph (DAG). Packages are nodes; “depends on” relationships are directed edges.
Identify the algorithm: topological sort. Return an ordering where every dependency comes before the thing that depends on it.
Name the complexity: O(V + E) where V is the number of packages and E is the number of dependency edges. Linear in the size of the dependency graph.
What does this tell you? Dependency resolution time should scale linearly with the number of packages and dependencies. If a package manager is slow, the cause is almost never the topological sort itself (O(V+E) is fast) — it is the graph construction phase: downloading package metadata, resolving version constraints, building the edges. Understanding the algorithm tells you where to look.
from collections import defaultdict, deque
def topological_sort(packages: list[str], dependencies: dict[str, list[str]]) -> list[str]:
"""
Returns install order: every dependency appears before the package that requires it.
Uses Kahn's algorithm: BFS on nodes with in-degree 0.
"""
in_degree = defaultdict(int)
graph = defaultdict(list)
for pkg, deps in dependencies.items():
for dep in deps:
graph[dep].append(pkg)
in_degree[pkg] += 1
# Start with packages that have no dependencies
queue = deque([p for p in packages if in_degree[p] == 0])
order = []
while queue:
pkg = queue.popleft()
order.append(pkg)
for dependent in graph[pkg]:
in_degree[dependent] -= 1
if in_degree[dependent] == 0:
queue.append(dependent)
if len(order) != len(packages):
raise ValueError("Circular dependency detected")
return orderThe three worked examples follow the same four-step method. Apply it to any system:
| Step | Question | Output |
|---|---|---|
| Find the bottleneck | What does this system do at highest volume or with most latency impact? | An operation |
| Identify the data structure | What is that operation acting on? | A type (list, map, graph, tree, heap) |
| Identify the algorithm | What is the operation? | A named algorithm |
| Name the complexity | What is the time complexity of that algorithm on that data structure? | An O-notation |
The O-notation is the answer to “how will this system behave at 10× scale?” Every performance conversation that proceeds without naming the complexity is guessing.
AT4 — Precomputation vs. On-Demand
Every system that has an algorithmic core has made a choice about when to do the work. The load balancer can precompute routing weights (AT4: precomputation) or decide on every request (on-demand). The session cache precomputes nothing — it stores and retrieves. The dependency resolver precomputes the install graph at resolve time so that install time is fast.
The tradeoff is always: spend time now (precomputation) to save time later, or defer the work and pay it on every request. Naming the algorithm lets you name this tradeoff precisely.
T12 — Tradeoffs (the meta-tradeoff)
The method in this chapter is itself a tradeoff. Analysing a system as an algorithm gives you performance clarity at the cost of implementation detail. You will know the O-notation of the session cache lookup but not the memory layout of the underlying hash table. For most architectural decisions, the O-notation is what matters. For memory pressure problems, you need to go one level deeper.
FM11 — Observability Blindness
The four-step method finds the algorithmic core from the outside. It assumes you can observe what the bottleneck operation is. When a system lacks instrumentation — no per-operation latency histograms, no queue depth metrics, no hit/miss ratios — you cannot complete step 1. You are guessing.
The method requires observability. Before you can identify the algorithmic core, you need measurements that tell you which operation is the bottleneck. A system without metrics is a system you cannot analyse.
FM6 — Hotspotting
Naming the algorithm does not automatically reveal which instance of the algorithm is being run. Hash-based routing is O(1) — but with a naive hash function, two-thirds of requests might route to the same three servers out of twenty. The complexity class is correct; the constants are wrong.
The algorithm tells you the class of the problem. It does not replace measurement.
Redis is a hash map with O(1) GET and SET operations, plus sorted sets (a skip list, O(log N) per operation) and lists (a doubly-linked list). When an engineer says “Redis is slow for range queries,” they are observing that Redis uses a skip list for sorted sets (O(log N) per operation, O(N) for a full range scan) rather than a B-tree (also O(log N) per operation, but with much better cache behaviour for sequential scans). The algorithm explains the observation.
Elasticsearch builds an inverted index: for each term in the corpus, a sorted list of document IDs that contain that term. A full-text search query is an intersection of sorted lists — an O(N) merge that benefits from sorted order. The latency of Elasticsearch searches scales with the number of matching documents, not the total corpus size. This is the inverted index at work.
Git represents history as a DAG of commits.
git log is a graph traversal. git merge is
finding the lowest common ancestor (LCA) in that DAG — an O(depth)
operation. When a repository has been running for ten years with
hundreds of branches, merge operations touch more of the graph. The
algorithm explains why merge latency correlates with repository age.
Concept: Reading Systems as Algorithms (the four-step method)
Core Idea: Every production system has an algorithmic core: find the bottleneck operation, identify the data structure, identify the algorithm, name the complexity. The complexity class tells you how the system scales. A system without a named algorithm is a system whose scaling behaviour is unknown.
Tradeoff: AT4 — Precomputation vs. On-Demand (systems choose when to do algorithmic work; the algorithm type determines the cost of each choice)
Failure Mode: FM11 — Observability Blindness (you cannot find the bottleneck without metrics; the method requires measurement before analysis)
Signal: When a system shows unexpected latency growth at scale, or when a performance problem cannot be reproduced at small load — apply the four-step method. The complexity class will tell you at what scale the problem must appear.
State the four steps of the method introduced in this chapter. For each step, explain what it produces.
A colleague says “the database is slow.” Using the vocabulary of this chapter, what is missing from that statement? What four questions would you ask to make it precise?
What is the difference between an algorithm’s complexity class and its constant factor? Give one example where the constant factor matters more than the complexity class.
A rate limiter tracks how many requests a given API key has made in
the last 60 seconds. It must check the count and increment it on every
request. The system currently uses a Redis sorted set, storing each
request as a member with its Unix timestamp as the score. To check the
rate limit, it queries the count of members in the score range
[now - 60, now].
Apply the four-step method to this system. Name the data structure, algorithm, and complexity class.
Redis sorted sets use a skip list. What is the time complexity of the range query? How does this compare to a simpler data structure (a plain list)?
An engineer proposes replacing the sorted set with a fixed-size circular buffer of 60 counters — one per second — and incrementing the appropriate bucket on each request. Apply the four-step method to this alternative. What is the complexity? What has changed?
A search autocomplete system must, given a prefix string typed by the user, return the top 10 matching completions from a corpus of 50 million product names. The system must respond in under 20 milliseconds at the 99th percentile.
Apply the four-step method. What is the bottleneck operation? What data structure would you choose? What algorithm? What is the complexity?
A trie (prefix tree) is O(k) for prefix lookup, where k is the length of the prefix. A hash map of all possible prefixes is O(1) for lookup but O(total_prefixes) in space. Which would you choose? Name the tradeoff explicitly using AT framework notation.
The corpus is updated 1,000 times per minute (new products added, old products removed). How does this affect your data structure choice? What is the additional complexity introduced by updates?
A complete answer will: state the governing tradeoff by code, name the failure mode the design exposes, and justify the chosen design against at least one alternative.