The Computing Series

Exercises

Level 2 — Apply

A startup’s web application handles 500 RPS. Average latency is 20ms. P99 latency is 800ms. The engineering team is considering adding a cache to reduce P99.

  1. Apply Little’s Law. At 500 RPS with 20ms mean latency, how many requests are in-flight simultaneously? If P99 is 800ms for 1% of requests, how many requests per second are experiencing 800ms latency?

  2. The team proposes adding an in-memory cache with a 90% hit rate. Cached reads take 1ms; uncached reads take 200ms (database). Calculate the new mean latency and estimate the new P99.

  3. The cache is a single instance. Apply the seven diagnostic questions. Which questions does this design fail? What architectural change is required?

Level 3 — Design

A social media platform currently handles 10,000 RPS with a single database and a single application server. The engineering team must design for 1,000,000 RPS within 12 months.

  1. Which of the three scaling axes (throughput, latency, storage) is the primary constraint? For each axis, identify what will break first at 100× load.

  2. Propose an architecture that handles 1,000,000 RPS. For each component you add, name the tradeoff it introduces using AT notation.

  3. The product requires that users always see their own writes immediately (read-your-own-write consistency). How does this constraint conflict with the architecture proposed in (b)? Propose a resolution and name the tradeoff.

A complete answer will: (1) identify the primary bottleneck at 100× load (the single database — write throughput) and name what breaks first on each axis (database connection pool for throughput, replica lag for latency, disk capacity for storage), (2) name at least two failure modes that each architectural addition introduces (e.g., FM4 stale data from read replicas, FM12 partition during cross-shard writes), (3) address the AT1 tradeoff between strong consistency for read-your-own-write (routes user reads to the primary) and the read scalability benefit of replica distribution, and (4) propose a concrete resolution — such as sticky reads via session token or read from primary with a timeout fallback to replica — with the latency cost of the mechanism quantified.

Read in the book →