Graphs: From Graph Theory to Social Networks to Conway's Law

In 1998, Sergey Brin and Larry Page described how they mapped the web. The algorithm was not sophisticated. It kept a frontier of URLs to visit, fetched each page, extracted its links, and added the unvisited ones back to the frontier. The frontier was a queue. The web was a graph.

That is the recurring move. A graph is a connected structure of nodes and edges, and once you learn to see it, it appears at every layer of the computing stack — and beyond the stack, in the organisations that build the stack. The graph is one of the most reusable structures in engineering precisely because so many different problems turn out to have the same shape.

The Web Is a Graph

A web crawler treats pages as nodes and hyperlinks as directed edges. It starts at seed URLs and expands outward. The traversal is breadth-first — BFS, queue-based — because BFS explores the pages closest to the seeds first, and pages linked from authoritative starting points matter more than pages reachable only through long chains.

The structure forces an engineering decision the textbook ignores. The frontier — the set of discovered-but-unvisited nodes — can grow to millions of URLs in the first few levels of a high out-degree graph. The graph's shape, not the algorithm, is what makes the frontier a resource problem. Production crawlers cap frontier size and back it with disk-based queues.

A Dependency Tree Is a Graph

A package manager must install all transitive dependencies of a package, in an order where each dependency comes before whatever needs it. Packages are nodes; "depends on" relationships are directed edges. The structure is a directed acyclic graph, and the traversal is depth-first — DFS dives down one complete path to a leaf before backtracking, which is exactly what recursive installation needs.

But "acyclic" is a promise the real graph can break. A version upgrade introduces a two-package mutual dependency, and a DFS with no cycle detection recurses forever. The fix is the two-set pattern: a visiting set tracks the current path (a node already in it is a back edge — a cycle), and a visited set tracks fully-processed nodes. The graph's structure — whether it is truly acyclic — is the thing the algorithm must defend against.

Memory Is a Graph

A garbage collector sees every object as a node and every reference as a directed edge. The mark phase is a traversal — BFS or DFS — starting from the GC roots: globals, stack variables, registers, the objects guaranteed live. Anything reachable from a root is live; anything unreachable is garbage.

This reveals a limit baked into the graph itself. A tracing GC marks everything reachable, even objects that are reachable but will never be used again. It cannot tell a memory leak from a needed object — both are simply nodes with an inbound edge from a root. That is why a long-lived data structure that quietly accumulates references leaks memory even in a garbage-collected language.

A Social Network Is a Graph

Users are nodes; connections are edges. "Find everyone within three degrees of separation of user X" is a bounded BFS. And the social graph exposes the failure mode common to all large graphs: in a network of a million users averaging 200 connections each, the BFS frontier at depth three is enormous. Wide graphs make BFS memory-hungry.

An Organisation Is a Graph — Conway's Law

Here is where the structure leaves the machine. Conway's Law observes that a system's architecture mirrors the communication graph of the organisation that built it. Teams are nodes; communication paths are edges. If three teams cannot talk to each other, the three components they own will not integrate cleanly — the missing edge in the org graph becomes a missing edge in the system graph. The graph you did not draw still constrains the graph you ship.

The same structure — nodes and edges — at every layer

  web page  ──link──────────▶ web page    BFS, queue frontier
  package   ──depends-on─────▶ package    DFS, topological order
  object    ──reference──────▶ object     GC mark traversal
  user      ──connection─────▶ user       bounded BFS
  team      ──communicates──▶ team        Conway's Law

  one structure ── two traversals ── two failure modes

The Tradeoff: Simplicity vs Flexibility

Traversing a graph forces a recurring choice — AT3 (Simplicity vs Flexibility). Recursive DFS is simple to write but bounded by the call stack: a dependency graph with 10,000 transitive nodes overflows Python's default 1,000-frame limit. Iterative DFS with an explicit stack is safer but more code. The recursive version is easier to understand; the iterative version is the one that survives a deep real-world graph.

Where It Fails: Unbounded Frontiers and Hidden Cycles

Every graph traversal in production meets FM3 (Unbounded Resource Consumption). BFS stores the whole frontier — and the frontier grows with graph width. DFS stores the path — and the call stack grows with graph depth. A web graph is wide; a deep dependency chain is deep. Both need explicit limits, because the graph does not come with any.

And cycles cause FM2 (Cascading Failures) in disguise: an undetected cycle in a dependency graph produces a hang or a stack overflow rather than a clear error, and the symptom shows up far from the cause. Every traversal of a graph that might contain cycles needs a visited set — the structure does not protect itself.

The One Sentence

A graph is nodes and edges, and the moment you learn to see it — in the web, in a dependency tree, in the heap, in a social network, in your own org chart — the same two traversals and the same two failure modes follow you everywhere, because the structure of the problem, not the domain it lives in, is what decides how you must walk it.

Concept: A graph is nodes and edges; the same structure recurs in the web, dependency trees, the heap, social networks, and org charts.

Core Idea: BFS (queue, width-bounded) and DFS (stack, depth-bounded) are the two traversals; the shape of the problem decides which you need.

Tradeoff: AT3 — Simplicity vs Flexibility: recursive DFS is simple but call-stack-bounded; iterative DFS is more code but survives deep graphs.

Failure Mode: FM3 — Unbounded Resource Consumption: BFS frontiers grow with width, DFS stacks grow with depth, and undetected cycles hang silently.

Signal: When a problem is a connected structure of nodes and edges, the traversal and its failure modes follow regardless of domain.

Series: Book 2, Ch 6