The Computing Series

How It Works

PostgreSQL: The B-Tree Index

A PostgreSQL index on a column stores the column values in a B-tree. Each B-tree node is one disk page (8 KB by default). A page holds ~400 index entries. With 400 entries per page and a balanced tree, a table with 400³ = 64 million rows requires a B-tree with only 3 levels. Three disk reads to find any row in a 64-million-row table.

The B-tree index page layout:

A heap pointer is a (block_number, tuple_offset) pair that tells PostgreSQL where the actual row lives in the table’s heap storage. The index stores keys; the heap stores full rows. An index scan first searches the B-tree for the key (O(log N) disk reads), then fetches the heap page (one more disk read).

# Simulating B-tree search logic (simplified, in-memory)
class BTreeNode:
    def __init__(self, is_leaf: bool = True, order: int = 4):
        self.is_leaf = is_leaf
        self.keys: list = []           # sorted keys in this node
        self.children: list = []        # child nodes (internal) or heap pointers (leaf)
        self.order = order              # max keys per node

    def search(self, key) -> tuple | None:
        """Binary search within this node's keys."""
        lo, hi = 0, len(self.keys) - 1
        while lo <= hi:
            mid = (lo + hi) // 2
            if self.keys[mid] == key:
                return (self, mid)
            elif self.keys[mid] < key:
                lo = mid + 1
            else:
                hi = mid - 1
        return None

def btree_search(node: BTreeNode, key) -> str | None:
    """
    Search for key in B-tree rooted at node.
    Returns heap pointer (simulated as string) or None.
    """
    result = node.search(key)
    if result:
        _, idx = result
        return node.children[idx]  # leaf: heap pointer

    if node.is_leaf:
        return None

    # Find correct child: first key > search key determines which subtree
    for i, k in enumerate(node.keys):
        if key < k:
            return btree_search(node.children[i], key)
    return btree_search(node.children[-1], key)

The critical property: at every level, the number of pages read is one (one disk I/O to load a node). The tree height is log_order(N). With order=400, height = log₄₀₀(N). For N = 64 million: height = 3.

This is why B-trees dominate database indexing. Not because binary search trees are wrong — but because B-trees are binary search trees redesigned for a storage medium where reading one byte costs the same as reading 8 KB.

File Systems: Directory Trees

Every major file system (ext4, APFS, NTFS, HFS+) stores directory entries in a tree. The tree is traversed whenever you resolve a path.

When you open /usr/local/bin/python, the operating system: 1. Starts at the root directory inode 2. Looks up usr in the root directory’s tree of entries → gets the inode for /usr 3. Looks up local in /usr’s tree → gets the inode for /usr/local 4. Looks up bin → inode for /usr/local/bin 5. Looks up python → inode for the file

Each step is a tree lookup in the directory structure for that path component. On ext4, directories with more than a few entries use an HTree (a hash tree): directory entries are hashed and stored in a B-tree-like structure, giving O(log N) lookup per directory component.

# Simplified path resolution
def resolve_path(path: str, filesystem: dict) -> int | None:
    """
    Resolve an absolute path to an inode number.
    filesystem: dict mapping inode -> {name: inode} for directories
    Returns inode number or None if not found.
    """
    components = [c for c in path.split('/') if c]
    current_inode = filesystem['root']

    for component in components:
        directory = filesystem.get(current_inode, {})
        if component not in directory:
            return None
        current_inode = directory[component]

    return current_inode

The number of disk reads for a path resolution is proportional to the path depth (number of / separators) multiplied by the height of each directory’s internal tree. A file at depth 5 with 10,000 files per directory requires 5 × O(log 10,000) = 5 × ~14 = ~70 comparisons. Still fast — but deep paths in crowded directories are slower than shallow paths in sparse directories.

Parsers: Abstract Syntax Trees

Every language interpreter and compiler converts source code into an abstract syntax tree (AST) before execution or compilation. The AST represents the logical structure of the program, stripped of syntactic noise (parentheses, commas, semicolons).

Consider the expression (a + b) * (c - d). The AST is:

       *
      / \
     +   -
    / \ / \
   a  b c  d

Parsing builds this tree; evaluation traverses it. The standard traversal order for arithmetic expressions is post-order: process both children before the parent. Post-order traversal of the tree above visits: a, b, +, c, d, -, * — which is exactly the order a stack-based interpreter evaluates the expression.

from dataclasses import dataclass
from typing import Union

@dataclass
class BinaryOp:
    op: str
    left: 'Node'
    right: 'Node'

@dataclass
class Literal:
    value: float

Node = Union[BinaryOp, Literal]

def evaluate(node: Node) -> float:
    """
    Post-order traversal of an AST.
    Children are evaluated before the operator is applied.
    """
    if isinstance(node, Literal):
        return node.value

    left_val = evaluate(node.left)   # evaluate left subtree first
    right_val = evaluate(node.right) # then right subtree

    # Apply operator at this node (post-order: parent processed last)
    ops = {'+': lambda a, b: a + b,
           '-': lambda a, b: a - b,
           '*': lambda a, b: a * b,
           '/': lambda a, b: a / b}
    return ops[node.op](left_val, right_val)

# (a + b) * (c - d) with a=3, b=4, c=10, d=2
tree = BinaryOp('*',
    BinaryOp('+', Literal(3), Literal(4)),
    BinaryOp('-', Literal(10), Literal(2)))

result = evaluate(tree)  # ((3+4) * (10-2)) = 56

The AST’s height is the nesting depth of the expression. Deep nesting (long chains of function calls, deeply nested conditionals) produces tall trees. Recursion-based traversal on a tall tree risks stack overflow — a practical constraint that compilers address by transforming tail recursion or by using iterative traversal with an explicit stack.


Read in the book →