Tree Patterns Everywhere

Introduction

Open a PostgreSQL table with a million rows and query for a single value. PostgreSQL finds it in under a millisecond. Not because it scanned all million rows — because it maintained a sorted tree of keys on disk, and each comparison eliminates half the remaining candidates.

Open a directory with 100,000 files and look up a filename. The operating system finds it in microseconds. Not by scanning — by traversing a tree structure that organises filenames by their position in a sorted order.

Parse a Python expression and evaluate it. The interpreter builds a tree of nodes, traverses it in post-order, and evaluates each node as it goes.

Three different domains. Three different systems. One structure: a tree. The specific variant — B-tree, directory tree, parse tree — differs by the constraints of the storage medium. The traversal logic is the same in all three.

Thread Activation

In Book 1, you built binary search trees (Ch 25) and saw why B-trees minimise disk reads (Ch 26). The core insight of Ch 26: a disk read has a fixed cost regardless of how much data you read within a page boundary. A B-tree node is sized to match a disk page, so each level of the tree costs exactly one disk read.

This chapter traces Thread 2 (Trees) into three production systems. In each case, the specific tree variant is determined by the same analysis: what are the constraints of the storage medium, and what tree structure matches those constraints? The thread continues in Book 3, Ch 9, where B-tree indexes become the foundation of query optimisation.