Exercises

Level 1 — Understand

What data structure does BFS use? What data structure does DFS use? How does the choice of data structure determine the traversal order?
What is a GC root? Why does a garbage collector start its traversal from roots rather than from all objects?
A dependency graph has a cycle: A depends on B and B depends on A. What happens when DFS traverses this graph without cycle detection? How does the two-set approach detect the cycle?

Level 2 — Apply

A social network has 1 million users. Each user has on average 200 connections. You need to find all users reachable from user X within 3 “degrees of separation” (3 hops).

Which traversal would you use? Justify with reference to the properties of BFS vs DFS.
What is the maximum number of nodes in the BFS frontier at depth 3?
The frontier at depth 3 exceeds available memory. Propose a modification to the traversal that bounds memory usage while still returning results.

Level 3 — Design

You are building a dependency analysis service for a large monorepo with 50,000 modules. Engineers want to know: (a) what is the full transitive dependency set of module X, and (b) which modules transitively depend on module Y (reverse dependency lookup).

Choose a traversal for each query. Design the data structures for forward and reverse dependency graphs.
The dependency graph changes ~500 times per day as engineers modify module dependencies. How do you keep the forward and reverse graphs consistent on change?
Query (a) is called 10,000 times per day; the same modules are queried repeatedly. Design a caching strategy for dependency sets. Name every tradeoff using AT notation.

A complete answer will: (1) choose BFS for transitive forward dependencies and DFS for reverse lookups (or justify an alternative) with complexity stated for a 50,000-node graph, (2) identify FM4 (stale data) as the failure mode when the cache serves dependency sets that have been invalidated by a graph change, and FM12 (network partition) as a risk when the reverse graph is updated asynchronously, (3) address the AT4 tradeoff between query-time computation (always fresh) and cached results (fast but potentially stale), and (4) propose a concrete invalidation mechanism tied to the 500 daily graph changes that minimises cache thrashing.

Read in the book →

← Real Systems