From Requirements to Architecture

Introduction

A product manager walks into the sprint planning meeting and says: “We need to support ten times the current user load by Q3.” Three engineers hear three different things. One thinks about database indexing. One thinks about horizontal scaling. One thinks about caching. All three are correct — and none of them has enough information to start building.

The gap between a product requirement and an architecture decision is not technical. It is analytical. Before a single component is chosen, the system’s behaviour must be characterised numerically: how many reads per second, how many writes, how much data, how consistent, how available, how fast. Requirements without numbers are not actionable. This chapter closes that gap with a systematic translation process.

The Problem

Given a product description, produce a quantified characterisation of the system’s demands: read and write QPS, storage volume, bandwidth, consistency requirements, and latency targets. From this characterisation, determine the architectural constraints that eliminate whole categories of solutions and narrow the design space to a tractable set of choices.

The challenge is that product requirements are stated in user terms (“users can post and read messages”) and must be translated into engineering terms (“1,200 write QPS, 120,000 read QPS, 180 TB storage over five years”) before any meaningful architecture work can begin.

Naive Approach and Why It Fails

The naive approach is to skip estimation and design for “scale” in the abstract — adding a load balancer, a caching layer, and read replicas because “that’s what scalable systems look like.” This produces an architecture that looks sophisticated but may be wildly over- or under-engineered for the actual load.

Over-engineering at low scale adds operational complexity that slows the team down. Under-engineering at high scale produces systems that fall over under real load. A URL shortener does not need a distributed Cassandra cluster at ten thousand daily active users. A social network with one hundred million DAU cannot survive on a single relational database.

The second failure: ignoring the read/write ratio. A system that is 95% reads requires a completely different architecture from one that is 95% writes. Read-heavy systems benefit from caching and read replicas. Write-heavy systems benefit from async ingest, write buffers, and append-only logs. Treating both the same produces a design that serves neither well.

Architecture Walkthrough

Step 1: Separate Functional from Non-Functional Requirements

Functional requirements describe what the system does. Non-functional requirements describe how well.

Functional (URL shortener):
  - User submits a long URL, receives a short code
  - User submits a short code, is redirected to the long URL
  - User can view click statistics for their URLs

Non-functional:
  - Redirect latency: p99 < 20ms
  - Availability: 99.99% (52 minutes/year downtime)
  - Read QPS: 10,000 redirects/sec at peak
  - Write QPS: 100 new URLs/sec
  - Data retention: 5 years
  - Consistency: eventual acceptable for click counts

Non-functional requirements are where architecture decisions live. The latency target of 20ms at p99 immediately implies caching — a database-only redirect cannot achieve that at 10,000 QPS. The 99.99% availability target implies redundancy and no single points of failure.

Step 2: Back-of-Envelope Estimation

Estimation uses four standard conversions:

Time:    1 day = 86,400 seconds ≈ 10^5
Storage: 1 char ≈ 1 byte; 1 KB = 10^3 bytes; 1 MB = 10^6; 1 GB = 10^9; 1 TB = 10^12
Network: typical server NIC = 1–10 Gbps = 125 MB/s – 1.25 GB/s
Memory:  typical server RAM = 64–256 GB

QPS formula: events_per_day / 86,400
Storage formula: records_per_day × record_size × retention_days
Bandwidth formula: QPS × average_response_size

Worked example for a messaging system with 100M DAU:

Write QPS:
  100M users × 10 messages/day / 86,400 ≈ 11,600 writes/sec ≈ 12,000

Read QPS (read:write ratio = 10:1):
  12,000 × 10 = 120,000 reads/sec

Storage per message: 1 KB (text + metadata)
Daily storage: 12,000 × 86,400 × 1 KB = 1,036,800,000 KB ≈ 1 TB/day
5-year storage: 1 TB × 365 × 5 = 1,825 TB ≈ 1.8 PB

Bandwidth (reads): 120,000 reads/sec × 1 KB = 120 MB/s

These numbers immediately constrain the architecture. 1.8 PB rules out a single server. 120,000 reads/sec rules out an uncached database. 12,000 writes/sec rules out synchronous write acknowledgement on anything slower than an SSD-backed system.

Step 3: Characterise the Read/Write Profile

The read/write ratio determines the primary architectural pattern.

Read-heavy (ratio > 10:1): caching is essential. Read replicas reduce load on the primary. CDN handles static or slowly-changing data. The design optimises for fast reads and tolerates slightly higher write latency.

Write-heavy (ratio < 2:1): buffered write paths matter. Async queues absorb write spikes. Append-only logs are cheaper than random-write databases. The design optimises for write throughput and may batch or delay reads.

Balanced (ratio 2:1 to 10:1): no single optimisation dominates. OLTP databases handle this well up to a point. Horizontal sharding is the first evolution step.

Step 4: Determine Consistency Requirements

Consistency requirements determine which storage and replication strategies are admissible.

Strong consistency: every read sees the most recent write. Required for financial transactions, inventory counts, authentication state. Forces synchronous replication. Rules out multi-region active-active without coordination.

Eventual consistency: reads may see stale data for a bounded time. Acceptable for social feeds, analytics, recommendation scores, notification badges. Enables async replication, higher availability, lower write latency.

Strong consistency → synchronous replication → higher write latency
Eventual consistency → async replication → potential stale reads

Step 5: Map to Architecture Constraints

From the characterised requirements, derive constraints:

If p99 read latency < 50ms AND read QPS > 10,000 → caching required
If write QPS > 5,000 → consider write buffering or async ingest
If storage > 10 TB → sharding or distributed storage required
If availability > 99.99% → no SPOF, redundancy at every layer
If strong consistency required → cannot use async replication

These constraints eliminate architecture classes and leave a narrow space of viable designs.

Key Design Decisions

AT1 — Consistency/Availability: The choice between strong and eventual consistency is the highest-leverage decision in distributed system design. Strong consistency requires coordination across replicas, which limits availability during network partitions. Eventual consistency accepts temporary inconsistency in exchange for availability and lower latency.

AT2 — Latency/Throughput: Latency and throughput trade against each other at every layer. Batching increases throughput by amortising fixed costs, but increases latency for individual requests. Caching reduces latency but adds write-path complexity and staleness risk.

AT4 — Precomputation/On-Demand: Read-heavy systems often precompute results — materialised views, cached feeds, prerendered pages. On-demand computation is simpler but cannot scale to the highest read QPS without paying the compute cost on every request.

Failure Modes in This System

FM4 — Data Consistency Failure: Choosing eventual consistency without understanding the consequences leads to anomalies users experience directly — stale inventory counts, duplicate charges, disappeared messages. Specify the acceptable staleness window before choosing eventual consistency.

FM3 — Unbounded Resource Consumption: Underestimating storage growth. Storage that looks manageable at launch becomes unmanageable in two years if the estimation was too optimistic. Build deletion policies, retention limits, and archival tiers into the design from the start.

FM6 — Hotspotting: Uniform distribution assumptions fail in practice. Users, content, and geographic load are never uniformly distributed. Estimation must account for peak-to-average ratios, not just averages. A system designed for average load fails at peak.

How It Evolves at Scale

At 10×: the database write path is the bottleneck. Sharding by user ID or content ID distributes writes. The read path moves toward dedicated read replicas. Consistency guarantees become harder to maintain across shards.

At 100×: data volume requires tiered storage — hot data in fast databases, warm data in object storage, cold data archived. The query model changes: not all historical data can be served at the same latency target as recent data.

Real-World Variants

Google’s Capacity Planning treats back-of-envelope estimation as a discipline, not an approximation. Their Site Reliability Engineering book documents the estimation process used before any major feature launch.

Amazon’s Six-Pager requires written characterisation of scale requirements before any system is designed. The document includes estimated load, growth projections, and failure scenarios — all non-functional requirements made explicit.

Stripe’s RFC Process requires that any system design document state the read and write QPS, storage requirements, and latency targets as the first section. Architecture decisions are not evaluated without those numbers.

Twitter’s Capacity Model (documented in their engineering blog) showed that the fan-out problem — a celebrity post triggering millions of feed updates — was invisible in back-of-envelope estimates that used average follower counts. The p99.9 case, not the average, drove the architecture.

Concept: Requirements to Architecture Translation

Thread: T12 (Tradeoffs) ← Ch 1 (Design Process) → Ch 3 (Distributed KV Store)

Core Idea: Product requirements must be translated into quantified engineering constraints — QPS, storage, bandwidth, consistency level — before any architecture decision can be evaluated. Numbers eliminate whole classes of solutions.

Tradeoff: AT1 — Consistency vs Availability: the highest-leverage decision, made once the consistency requirement is quantified from product requirements.

Failure Mode: FM4 — Data Consistency Failure: choosing eventual consistency without specifying the acceptable staleness window leads to anomalies users observe directly.

Signal: When a product requirement lacks explicit scale or consistency targets and an architecture decision must be made.

Maps to: Book 0, Framework 6 (System Archetypes)

Exercises

Level 1 — Understand

What formula is used to convert events per day into QPS, and what standard approximation for seconds-per-day enables back-of-envelope estimation?
Name the three read/write ratio classifications (read-heavy, write-heavy, balanced) and identify one architectural optimisation that is correct for each.
What is the difference between strong consistency and eventual consistency? Give one example of a system requirement that demands strong consistency and one that can accept eventual consistency.

Level 2 — Apply

A social photo-sharing app has 20M DAU. Each user views 50 photos per day and posts 1 photo per week. Average photo size is 3 MB. Calculate: (a) read QPS for photo serving, (b) write QPS for photo uploads, (c) daily storage added, (d) 3-year total storage. State whether the read/write ratio suggests a read-heavy or write-heavy architecture.
A bank’s transaction service requires that account balances are always correct — a user must never see a balance that does not reflect all committed transactions. Classify this as strong or eventual consistency. What AT code describes the tradeoff the database must make to provide this guarantee? What FM code describes the failure if this guarantee is violated?

Level 3 — Design

You are designing a real-time analytics dashboard for an e-commerce platform. Requirements: 500M events per day ingested (clicks, page views, purchases), dashboard queries must return in under 2 seconds, data must be accurate within 5 minutes of the event, retention is 2 years. Perform a full back-of-envelope estimation. Characterise the read/write profile. State the consistency requirement with AT code. Identify the two most significant architectural constraints that the numbers impose, and state one design decision that each constraint forces.

A complete answer will: (1) produce a correct back-of-envelope estimate — 500M events/day ≈ 5,800 writes/sec peak, 2-year retention at ~500 bytes/event ≈ 365 TB — and classify the workload as write-heavy ingest with read-heavy query (high read/write asymmetry), (2) state AT1 (Consistency/Availability) as the governing tradeoff and justify accepting eventual consistency (5-minute accuracy window) over strong consistency to avoid blocking ingest at 5,800 writes/sec, (3) name at least two FM codes that arise — FM3 (resource exhaustion on the ingest pipeline at sustained 5,800 writes/sec with traffic spikes) and FM6 (hotspot on time-series data for the current hour’s partition under bursty event load), and (4) map each constraint to a concrete design decision: the write-rate constraint forces a streaming ingest layer (e.g., Kafka) decoupled from the query store, and the 365 TB retention constraint forces tiered storage (hot store for recent data, cold object store for archival) with a stated partition or compaction strategy.

Read in the book →