Tree Data Structures in Production Systems

Open a PostgreSQL table with a million rows and query for one value. It comes back in under a millisecond. Not because the database scanned a million rows — because it kept a sorted tree of keys on disk, and each comparison threw away half the remaining candidates.

Open a directory with 100,000 files and look up a name. The operating system finds it in microseconds, by traversing a tree that organises filenames in sorted order.

Parse a Python expression and evaluate it. The interpreter builds a tree of nodes, walks it in post-order, and evaluates each node as it goes.

Three domains, three systems, one structure: a tree. The variant differs — B-tree, directory tree, parse tree — but the variant is not arbitrary. Each one is the same idea reshaped to fit the constraints of where it lives.

What a Tree Actually Is

A tree is a connected, acyclic graph. Its two fundamental operations are insertion (place a node in its correct position) and search (find a node by key). Both walk from the root toward a leaf, making one comparison per node to decide which child to follow.

Performance depends entirely on height. A balanced tree with N nodes has height O(log N). An unbalanced tree — the pathological case where each node has one child — has height O(N), which is just a linked list wearing a costume.

Three factors vary across tree variants, and every production tree picks them based on its environment:

Branching factor — children per node (binary = 2, B-tree = hundreds)
Balance guarantee — whether height stays equal across branches
Node size — how much data fits in one node (one key, or hundreds)

PostgreSQL: The B-Tree, Reshaped for Disk

A binary search tree makes one comparison and follows one of two children. On disk, that is a disaster. A disk read has a fixed cost whether you read one byte or 8 kilobytes — so a binary tree, with one key per node, pays a full disk read to eliminate a single comparison's worth of candidates.

The B-tree fixes this by matching the node to the medium. Each B-tree node is one disk page — 8 KB in PostgreSQL — holding around 400 sorted keys.

B-tree node = 1 disk page = ~400 keys

  height = log₄₀₀(N)
  400¹ =        400 rows   → 1 disk read
  400² =    160,000 rows   → 2 disk reads
  400³ = 64,000,000 rows   → 3 disk reads

Three disk reads to find any row in a 64-million-row table. Each level of the tree costs exactly one disk I/O, and a node holds 400 keys, so the tree is impossibly shallow. This is why B-trees dominate database indexing — not because binary search trees are wrong, but because B-trees are binary search trees redesigned for a medium where reading one byte costs the same as reading 8 KB.

File Systems: The Directory Tree

Every major file system — ext4, APFS, NTFS — stores directory entries in a tree, traversed every time you resolve a path. Opening /usr/local/bin/python is a sequence of tree lookups: find usr in the root's tree, then local in that tree, then bin, then python.

/  →  usr  →  local  →  bin  →  python
   lookup   lookup   lookup   lookup

On ext4, a directory with more than a handful of entries uses an HTree — a hash tree — giving O(log N) lookup per path component. A file five levels deep in directories of 10,000 files costs 5 × log(10,000) ≈ 70 comparisons. Fast, but the lesson is real: deep paths in crowded directories are measurably slower than shallow paths in sparse ones.

Parsers: The Abstract Syntax Tree

Every interpreter and compiler turns source code into an abstract syntax tree before doing anything with it. The expression (a + b) * (c - d) becomes:

       *
      / \
     +   -
    / \ / \
   a  b c  d

Parsing builds the tree; evaluation traverses it in post-order — both children before the parent. The post-order walk visits a, b, +, c, d, -, *, which is exactly the order a stack-based interpreter evaluates the expression. The tree's height is the nesting depth of the code, which is why deeply nested expressions risk stack overflow on recursive traversal — a real constraint compilers handle with explicit-stack iteration.

The Tradeoff: Precomputation vs On-Demand

A B-tree index is precomputed (AT4). Every INSERT and UPDATE must update the index — you pay at write time. The payoff comes at read time: a query that would scan the whole table instead walks three levels of tree.

But the payoff is conditional. PostgreSQL's query planner knows this and does not always use the index. For a query that returns 30% of a table, the index produces 30% of N random heap fetches — and random I/O is 10–100× slower than the sequential I/O of a plain table scan. This is FM5, latency amplification: forcing an index scan on a poorly selective query makes it slower. The index wins for selective queries on large tables, and loses otherwise.

There is a second tradeoff — AT3, simplicity vs flexibility. A binary search tree is under 100 lines of code. A B-tree, with node splits, merges, and page management, runs to thousands. The flexibility of performing well on disk is bought with a large increase in implementation complexity.

The One Sentence

A tree turns a linear scan into a logarithmic walk by making every comparison eliminate a fraction of what remains — and the right tree variant is whichever one shapes its branching factor and node size to the cost structure of its storage medium. When a lookup is slow, ask which tree should be answering it, and whether that tree was built for the medium it lives on.