The Computing Series

Architecture Walkthrough

Step 1: Separate Functional from Non-Functional Requirements

Functional requirements describe what the system does. Non-functional requirements describe how well.

Functional (URL shortener):
  - User submits a long URL, receives a short code
  - User submits a short code, is redirected to the long URL
  - User can view click statistics for their URLs

Non-functional:
  - Redirect latency: p99 < 20ms
  - Availability: 99.99% (52 minutes/year downtime)
  - Read QPS: 10,000 redirects/sec at peak
  - Write QPS: 100 new URLs/sec
  - Data retention: 5 years
  - Consistency: eventual acceptable for click counts

Non-functional requirements are where architecture decisions live. The latency target of 20ms at p99 immediately implies caching — a database-only redirect cannot achieve that at 10,000 QPS. The 99.99% availability target implies redundancy and no single points of failure.

Step 2: Back-of-Envelope Estimation

Estimation uses four standard conversions:

Time:    1 day = 86,400 seconds ≈ 10^5
Storage: 1 char ≈ 1 byte; 1 KB = 10^3 bytes; 1 MB = 10^6; 1 GB = 10^9; 1 TB = 10^12
Network: typical server NIC = 1–10 Gbps = 125 MB/s – 1.25 GB/s
Memory:  typical server RAM = 64–256 GB

QPS formula: events_per_day / 86,400
Storage formula: records_per_day × record_size × retention_days
Bandwidth formula: QPS × average_response_size

Worked example for a messaging system with 100M DAU:

Write QPS:
  100M users × 10 messages/day / 86,400 ≈ 11,600 writes/sec ≈ 12,000

Read QPS (read:write ratio = 10:1):
  12,000 × 10 = 120,000 reads/sec

Storage per message: 1 KB (text + metadata)
Daily storage: 12,000 × 86,400 × 1 KB = 1,036,800,000 KB ≈ 1 TB/day
5-year storage: 1 TB × 365 × 5 = 1,825 TB ≈ 1.8 PB

Bandwidth (reads): 120,000 reads/sec × 1 KB = 120 MB/s

These numbers immediately constrain the architecture. 1.8 PB rules out a single server. 120,000 reads/sec rules out an uncached database. 12,000 writes/sec rules out synchronous write acknowledgement on anything slower than an SSD-backed system.

Step 3: Characterise the Read/Write Profile

The read/write ratio determines the primary architectural pattern.

Read-heavy (ratio > 10:1): caching is essential. Read replicas reduce load on the primary. CDN handles static or slowly-changing data. The design optimises for fast reads and tolerates slightly higher write latency.

Write-heavy (ratio < 2:1): buffered write paths matter. Async queues absorb write spikes. Append-only logs are cheaper than random-write databases. The design optimises for write throughput and may batch or delay reads.

Balanced (ratio 2:1 to 10:1): no single optimisation dominates. OLTP databases handle this well up to a point. Horizontal sharding is the first evolution step.

Step 4: Determine Consistency Requirements

Consistency requirements determine which storage and replication strategies are admissible.

Strong consistency: every read sees the most recent write. Required for financial transactions, inventory counts, authentication state. Forces synchronous replication. Rules out multi-region active-active without coordination.

Eventual consistency: reads may see stale data for a bounded time. Acceptable for social feeds, analytics, recommendation scores, notification badges. Enables async replication, higher availability, lower write latency.

Strong consistency → synchronous replication → higher write latency
Eventual consistency → async replication → potential stale reads

Step 5: Map to Architecture Constraints

From the characterised requirements, derive constraints:

If p99 read latency < 50ms AND read QPS > 10,000 → caching required
If write QPS > 5,000 → consider write buffering or async ingest
If storage > 10 TB → sharding or distributed storage required
If availability > 99.99% → no SPOF, redundancy at every layer
If strong consistency required → cannot use async replication

These constraints eliminate architecture classes and leave a narrow space of viable designs.

Read in the book →