Big-O Notation

A team ships a feature that passes all tests. n = 1,000 in test. Everything is fast. They deploy to production. n = 10,000,000. The feature times out. They roll back. They examine the code. It has not changed.

The code was always slow. They never gave it an input large enough to find out.

Big-O notation is the tool that lets you predict this failure before it happens. It is a language for naming how an algorithm’s resource consumption grows. If you can read Big-O, you can look at code and say: “This will fail at n = 100,000” — before you deploy it.

Thread Activation

Thread 12 (Tradeoffs) continues here from Chapter 17.

Big-O is the formal language for expressing the tradeoffs named in Chapter 17. In Chapter 17 you learned that time and space are the two dimensions. In this chapter you learn the vocabulary for naming positions on those dimensions. “O(n log n) time, O(n) space” is a statement about a tradeoff. You cannot make tradeoffs you cannot name. Thread 12 continues through to algorithm selection in Book 2, Chapter 20 — where you will choose between algorithms by comparing their Big-O profiles against specific workload characteristics.

The Concept

Big-O notation describes an upper bound on an algorithm’s growth rate. It captures the dominant term of the resource function and discards constant factors and lower-order terms.

Formally: f(n) is O(g(n)) if there exist constants c > 0 and n₀ ≥ 0 such that f(n) ≤ c · g(n) for all n ≥ n₀.

In engineering terms: O(g(n)) means “grows no faster than g(n) for large n, ignoring constants.”

Why Discard Constants?

Constants depend on hardware, compiler, language runtime, and memory layout. An algorithm that does 1,000 simple operations per element and one that does 1 operation per element are both O(n). On different hardware, the ratio between them shifts. But both remain O(n) forever — their growth shape does not change.

The growth shape is what determines scalability. Constants determine absolute speed on specific hardware. For system design, growth shape matters more.

The Common Complexity Classes

The six complexity classes you will encounter most often, ordered by growth rate:

O(1) — Constant. The algorithm takes the same time regardless of input size. Array index lookup. Hash table get (average case). Stack push/pop.

O(log n) — Logarithmic. The algorithm’s work increases by 1 unit each time n doubles. Binary search. B-tree lookup. Finding the height of a balanced binary tree.

O(n) — Linear. The algorithm visits each element a constant number of times. Linear scan. Finding the maximum of an unsorted array. Reading a file.

O(n log n) — Linearithmic. The dominant complexity class for comparison-based sorting. Merge sort. Heapsort. Fast Fourier Transform. The best possible for comparison-based sorting of arbitrary data (non-comparison sorts like radix and counting can do better under specific input assumptions).

O(n²) — Quadratic. The algorithm compares every element to every other element. Bubble sort. Insertion sort (worst case). Naive string matching.

O(2ⁿ) — Exponential. The algorithm’s runtime doubles with each additional element. Naive recursive Fibonacci. Generating all subsets of a set. Brute-force 0/1 knapsack — try every subset of items.

O(n!) — Factorial. The algorithm enumerates every ordering of n elements. Brute-force traveling salesman (try every permutation of cities). Generating all permutations for shuffle tests. Factorial is even worse than exponential — at n = 20 it is already over 10¹⁸.

Visual Intuition: What These Mean at Scale

The following table shows approximate operation counts for common complexity classes. At n = 1 billion, the difference between O(log n) and O(n²) is the difference between 30 operations and 1,000,000,000,000,000,000 (10¹⁸) operations.

Complexity	n = 1,000	n = 1,000,000	n = 1,000,000,000
O(1)	1	1	1
O(log n)	10	20	30
O(n)	1,000	1,000,000	1,000,000,000
O(n log n)	10,000	20,000,000	30,000,000,000
O(n²)	1,000,000	10¹²	10¹⁸
O(2ⁿ)	10³⁰¹	—	—

At 10⁹ operations per second (a rough estimate for a modern CPU), O(n) at n = 10⁹ takes about 1 second. O(n²) at n = 10⁹ takes about 31 years.

How It Works

Deriving Big-O from Code

Three rules cover most cases:

Rule 1: Simple loops. A loop that runs n times is O(n).

Rule 2: Nested loops. A loop inside a loop is O(n²). Three nested loops are O(n³). Each nesting level multiplies by n.

Rule 3: Divide-and-conquer recursion. A recursive call that makes two subproblems of half the input and does O(n) work at each level is O(n log n) — this is merge sort. A recursive call that makes one subproblem of half the input and does O(1) work at each level is O(log n) — this is binary search. The number of subproblems matters: one subproblem with O(n) merge work is only O(n), not O(n log n). The full treatment is in Chapter 19 (recurrence relations).

def demonstrate_complexity_classes(data: list[int]) -> None:
    """
    Demonstrates: deriving Big-O from loop structure.
    Each section is labeled with its complexity.
    """
    n = len(data)

    # O(1) — constant: no loop, fixed number of operations
    first_element = data[0] if data else None

    # O(log n) — binary search: input halves on each iteration
    low, high = 0, n - 1
    while low < high:
        # Demonstrating halving pattern only
        low = (low + high) // 2 + 1
        # In real binary search, this finds the target

    # O(n) — linear scan: visits each element once
    running_sum = 0
    for value in data:
        running_sum += value

    # O(n log n) — sort: this is Python's Timsort, guaranteed O(n log n)
    sorted_data = sorted(data)

    # O(n²) — nested loops: every pair of elements
    pair_count = 0
    for i in range(n):
        for j in range(i + 1, n):
            # Compares every unique pair — n*(n-1)/2 comparisons
            pair_count += 1

    # O(2^n) — subset generation: 2^n subsets for n elements
    # Only safe for small n; do not call with n > 25
    def generate_subsets(elements: list[int]) -> list[list[int]]:
        if not elements:
            return [[]]
        first = elements[0]
        rest_subsets = generate_subsets(elements[1:])
        # Each existing subset appears twice: without first, and with first
        return rest_subsets + [[first] + subset for subset in rest_subsets]

Reading Complexity from Recursive Functions

For recursive functions, identify two things: how many recursive calls are made per call (branching factor), and how the input size changes per call.

def binary_search_recursive(sorted_array: list[int], target: int,
                             low: int, high: int) -> int:
    """
    Demonstrates: O(log n) recursive — one call per level, input halves each time.
    Recurrence: T(n) = T(n/2) + O(1)
    Solution: T(n) = O(log n)
    """
    if low > high:
        return -1  # Base case: search space exhausted

    mid = (low + high) // 2

    if sorted_array[mid] == target:
        return mid
    elif sorted_array[mid] < target:
        # One recursive call, on half the input — this is the halving step
        return binary_search_recursive(sorted_array, target, mid + 1, high)
    else:
        return binary_search_recursive(sorted_array, target, low, mid - 1)

One recursive call per level. Input halves. Depth of recursion: log₂(n). Total operations: O(log n).

Contrast with naive Fibonacci:

def fibonacci_naive(n: int) -> int:
    """
    Demonstrates: O(2^n) — two recursive calls per call, depth n.
    Do not call with n > 35; runtime grows exponentially.
    """
    if n <= 1:
        return n
    # Two recursive calls at every level — the call tree has 2^n nodes
    return fibonacci_naive(n - 1) + fibonacci_naive(n - 2)

Two recursive calls per call. Depth n. Total nodes in the call tree: approximately 2ⁿ. Each node does O(1) work. Total: O(2ⁿ).

Dropping Lower-Order Terms

When an algorithm has multiple phases, the dominant term determines the Big-O class.

An algorithm that runs O(n²) then O(n) steps is O(n²) overall. At large n, the O(n) phase is negligible.

def two_phase_algorithm(data: list[int]) -> list[int]:
    """
    Demonstrates: O(n²) + O(n) = O(n²) — dominated by the quadratic phase.
    """
    result = []

    # Phase 1: O(n²) — build all pairs
    for i in range(len(data)):
        for j in range(len(data)):
            result.append(data[i] + data[j])

    # Phase 2: O(n²) elements to sort — dominated by phase 1 creation
    # Phase 2 is O(n² log n²) = O(n² log n), which is worse than n² alone.
    # For the purposes of this demonstration, we stop after phase 1.
    return result

    # Total complexity: O(n²) — the nested loop dominates

Tradeoffs

AT9 (Correctness vs Performance): Big-O hides constants. An O(n log n) algorithm with a large constant factor can be slower than an O(n²) algorithm with a small constant factor — for all n that occur in practice. At n = 10⁹, treating O(n log n) as if it were O(n) is FM3 in practice. But treating O(n²) as equivalent to O(n log n) for n = 50 is an unnecessary optimization effort.

The practical rule: profile first, analyze second, optimize only where measurements show a problem.

The space-time duality (AT2 continued from Ch 17): Almost every technique for reducing time complexity increases space complexity. Memoization (Chapter 21) reduces O(2ⁿ) to O(n) time by paying O(n) space. Sorting reduces O(n) search to O(log n) search by paying O(n log n) time upfront. There is no free lunch.

Where It Fails

FM3 (Unbounded Resource Consumption) — treating O(n log n) as equivalent to O(n) at n = 10⁹ causes resource exhaustion.

At n = 10⁹: - O(n) at 10⁹ operations/second: ~1 second - O(n log n) at 10⁹ operations/second: ~30 seconds - O(n²) at 10⁹ operations/second: ~31 years

The difference between O(n) and O(n log n) is a factor of 30 at n = 10⁹. For most applications this is acceptable. For real-time systems with n = 10⁹, it is not.

The failure mode: an engineer sees “O(n log n) is close to O(n)” in a textbook and treats them as equivalent. At scale, they are not equivalent. At n = 10⁹, the factor-of-30 difference is the difference between a 1-second operation and a 30-second operation.

A second failure mode: Big-O analysis does not account for cache behavior. An O(n) algorithm with poor cache locality can be slower in practice than an O(n log n) algorithm with excellent cache behavior, for n that fits in CPU cache. The constant factor from cache misses dominates. This is why mergesort (O(n log n), but with extra memory traffic and write amplification from the auxiliary buffer) is often slower in practice than quicksort (O(n log n) expected, excellent cache locality) on modern hardware.

Real Systems

Elasticsearch query complexity: Elasticsearch documents the complexity of its query types. A term query is O(log n). A wildcard query is O(n) or worse depending on the pattern. A developer who replaces a term query with a wildcard query in a hot search path has changed the system’s complexity class — which becomes visible when the index grows from 1 million to 100 million documents.

Python’s sort: Python’s Timsort is O(n log n) worst case and O(n) best case on nearly-sorted data. The best-case behavior is intentional — real datasets are often partially sorted. Knowing this, Python developers use Timsort on partially-sorted inputs for O(n) performance without changing the algorithm. Understanding Big-O lets you reason about when these optimizations apply.

DNS TTL and cache invalidation: A DNS resolver that checks all cached entries on every request has O(n) lookup complexity where n is the number of cached entries. Resolvers that use hash tables have O(1) average lookup. The difference is invisible at 100 cached entries. At 10 million cached entries (a large anycast resolver), the O(n) design fails. Real DNS resolvers are designed as hash tables from the ground up because the scale requirement was known in advance.

Concept: Big-O notation — formal language for naming algorithmic growth rates

Core Idea: O(1) < O(log n) < O(n) < O(n log n) < O(n²) < O(2ⁿ). At n=10⁹, these differ by astronomical factors.

Tradeoff: AT9 — Big-O hides constants; an O(n log n) algorithm with large constants can be slower than O(n²) for all practical n values

Failure Mode: FM3 — treating O(n log n) as O(n) at n=10⁹ causes 30x resource overrun; treating O(n²) as acceptable at large n causes system failure

Signal: When “works fine in testing, fails in production”, the algorithm’s complexity class was not evaluated against production-scale n

Exercises

Level 1 — Understand

Rank the following complexity classes from fastest-growing to slowest-growing: O(n²), O(1), O(2ⁿ), O(n log n), O(n), O(log n). At n = 1,000, compute the approximate operation count for each.
An algorithm runs in exactly 3n² + 14n + 7 steps. What is its Big-O class? Explain why you discard the 14n + 7 terms.
Explain in one sentence why Big-O notation discards constant factors. Why is this useful despite being imprecise?

Level 2 — Apply

Derive the Big-O time complexity of each of the following code fragments. Show your reasoning.

# Fragment A
for i in range(n):
    for j in range(10):  # Note: j goes to 10, not n
        do_work()

# Fragment B
i = n
while i > 1:
    i = i // 2
    do_work()

# Fragment C
for i in range(n):
    for j in range(i, n):
        for k in range(j, n):
            do_work()

A sorting algorithm claims to be O(n log n). At n = 100, it takes 0.1ms. Predict its runtime at n = 1,000,000. At n = 1,000,000,000. Show your work using the ratio method: if T(n) = c · n log n, then T(m)/T(n) = (m log m)/(n log n).
The following function is claimed to be O(n). Identify the hidden O(n) operation that makes the actual complexity O(n²).

def find_unique(items: list[str]) -> list[str]:
    unique = []
    for item in items:
        if item not in unique:  # What is the complexity of this check?
            unique.append(item)
    return unique

Level 3 — Design

A team is building a feature that checks whether a username is taken. Option A: scan the list of all usernames on each check (O(n)). Option B: maintain a hash set of usernames and check membership (O(1) average). The current user count is 50,000. The service handles 10,000 username checks per second. At what user count does Option A become unacceptable if the latency budget per check is 1ms, and the list scan performs 10⁸ comparisons per second?

A complete answer will: define the chosen approach, quantify its cost or complexity, and identify what breaks first as scale or load increases.

Design a data structure that supports O(1) insert, O(1) delete, and O(1) get-random-element. Describe the structure, the operations, and prove each operation’s complexity.

A complete answer will: walk the reasoning step by step, cite the relevant tradeoff and failure-mode codes, and show the calculation or invariant that supports the conclusion.

A log aggregation pipeline ingests events from 500 servers. Each server produces 1,000 events per second. The pipeline must deduplicate events (same event ID seen from multiple servers due to retries). The current deduplication step compares each incoming event against a list of recently seen event IDs using linear scan. The deduplication window is 60 seconds. Compute the size of the seen-ID list at steady state. Derive the Big-O cost of deduplication per event and the total cost per second. Redesign the deduplication step to achieve O(1) per-event cost and specify the memory budget required for the seen-ID store.

A complete answer will: name the structure or technique chosen, state why an obvious alternative was rejected, and identify the conditions under which the choice ceases to hold.