A PostgreSQL index on a column stores the column values in a B-tree. Each B-tree node is one disk page (8 KB by default). A page holds ~400 index entries. With 400 entries per page and a balanced tree, a table with 400³ = 64 million rows requires a B-tree with only 3 levels. Three disk reads to find any row in a 64-million-row table.
The B-tree index page layout:
A heap pointer is a (block_number, tuple_offset) pair
that tells PostgreSQL where the actual row lives in the table’s heap
storage. The index stores keys; the heap stores full rows. An index scan
first searches the B-tree for the key (O(log N) disk reads), then
fetches the heap page (one more disk read).
# Simulating B-tree search logic (simplified, in-memory)
class BTreeNode:
def __init__(self, is_leaf: bool = True, order: int = 4):
self.is_leaf = is_leaf
self.keys: list = [] # sorted keys in this node
self.children: list = [] # child nodes (internal) or heap pointers (leaf)
self.order = order # max keys per node
def search(self, key) -> tuple | None:
"""Binary search within this node's keys."""
lo, hi = 0, len(self.keys) - 1
while lo <= hi:
mid = (lo + hi) // 2
if self.keys[mid] == key:
return (self, mid)
elif self.keys[mid] < key:
lo = mid + 1
else:
hi = mid - 1
return None
def btree_search(node: BTreeNode, key) -> str | None:
"""
Search for key in B-tree rooted at node.
Returns heap pointer (simulated as string) or None.
"""
result = node.search(key)
if result:
_, idx = result
return node.children[idx] # leaf: heap pointer
if node.is_leaf:
return None
# Find correct child: first key > search key determines which subtree
for i, k in enumerate(node.keys):
if key < k:
return btree_search(node.children[i], key)
return btree_search(node.children[-1], key)The critical property: at every level, the number of pages read is one (one disk I/O to load a node). The tree height is log_order(N). With order=400, height = log₄₀₀(N). For N = 64 million: height = 3.
This is why B-trees dominate database indexing. Not because binary search trees are wrong — but because B-trees are binary search trees redesigned for a storage medium where reading one byte costs the same as reading 8 KB.
Every major file system (ext4, APFS, NTFS, HFS+) stores directory entries in a tree. The tree is traversed whenever you resolve a path.
When you open /usr/local/bin/python, the operating
system: 1. Starts at the root directory inode 2. Looks up
usr in the root directory’s tree of entries → gets the
inode for /usr 3. Looks up local in
/usr’s tree → gets the inode for /usr/local 4.
Looks up bin → inode for /usr/local/bin 5.
Looks up python → inode for the file
Each step is a tree lookup in the directory structure for that path component. On ext4, directories with more than a few entries use an HTree (a hash tree): directory entries are hashed and stored in a B-tree-like structure, giving O(log N) lookup per directory component.
# Simplified path resolution
def resolve_path(path: str, filesystem: dict) -> int | None:
"""
Resolve an absolute path to an inode number.
filesystem: dict mapping inode -> {name: inode} for directories
Returns inode number or None if not found.
"""
components = [c for c in path.split('/') if c]
current_inode = filesystem['root']
for component in components:
directory = filesystem.get(current_inode, {})
if component not in directory:
return None
current_inode = directory[component]
return current_inodeThe number of disk reads for a path resolution is proportional to the
path depth (number of / separators) multiplied by the
height of each directory’s internal tree. A file at depth 5 with 10,000
files per directory requires 5 × O(log 10,000) = 5 × ~14 = ~70
comparisons. Still fast — but deep paths in crowded directories are
slower than shallow paths in sparse directories.
Every language interpreter and compiler converts source code into an abstract syntax tree (AST) before execution or compilation. The AST represents the logical structure of the program, stripped of syntactic noise (parentheses, commas, semicolons).
Consider the expression (a + b) * (c - d). The AST
is:
*
/ \
+ -
/ \ / \
a b c d
Parsing builds this tree; evaluation traverses it. The standard traversal order for arithmetic expressions is post-order: process both children before the parent. Post-order traversal of the tree above visits: a, b, +, c, d, -, * — which is exactly the order a stack-based interpreter evaluates the expression.
from dataclasses import dataclass
from typing import Union
@dataclass
class BinaryOp:
op: str
left: 'Node'
right: 'Node'
@dataclass
class Literal:
value: float
Node = Union[BinaryOp, Literal]
def evaluate(node: Node) -> float:
"""
Post-order traversal of an AST.
Children are evaluated before the operator is applied.
"""
if isinstance(node, Literal):
return node.value
left_val = evaluate(node.left) # evaluate left subtree first
right_val = evaluate(node.right) # then right subtree
# Apply operator at this node (post-order: parent processed last)
ops = {'+': lambda a, b: a + b,
'-': lambda a, b: a - b,
'*': lambda a, b: a * b,
'/': lambda a, b: a / b}
return ops[node.op](left_val, right_val)
# (a + b) * (c - d) with a=3, b=4, c=10, d=2
tree = BinaryOp('*',
BinaryOp('+', Literal(3), Literal(4)),
BinaryOp('-', Literal(10), Literal(2)))
result = evaluate(tree) # ((3+4) * (10-2)) = 56The AST’s height is the nesting depth of the expression. Deep nesting (long chains of function calls, deeply nested conditionals) produces tall trees. Recursion-based traversal on a tall tree risks stack overflow — a practical constraint that compilers address by transforming tail recursion or by using iterative traversal with an explicit stack.