Show an engineer a box-and-arrow diagram labelled "System Architecture" and they will nod and say it looks reasonable. Show them five specific diagrams — each asking a different question about the same system — and they will find at least two problems the first diagram hid.
The five architecture diagrams are not different ways to draw the same picture. They are five different questions, each of which is only visible from a specific angle.
Why One Diagram Is Never Enough
A single architecture diagram tries to show the request path, the data storage model, the async dependencies, the pipeline stages, and the consensus topology simultaneously. To include everything, it becomes too complex to read. To be readable, it hides the details that matter.
The solution is not a better single diagram. It is five focused diagrams, each answering exactly one question.
| # | Diagram | The question it answers |
|---|---|---|
| D1 | Request Flow | What is the path of a request? What is the latency at each hop? |
| D2 | Data Storage | Where is all the data? What is the consistency model for each store? |
| D3 | Event-Driven / Async Coupling | What are the asynchronous dependencies and their delivery guarantees? |
| D4 | Data Pipeline | How does data flow from raw events to queryable results? What is the freshness? |
| D5 | Distributed Coordination | Where does the system require multiple nodes to agree? What happens during a partition? |
Each diagram reveals one failure surface that the others cannot show.
D1 — Request Flow
The question: For a user-facing operation, what is the path of a request from the client to the data and back? What is the latency at each hop? What happens if any component fails?
A request flow diagram shows every component a request touches, in sequence, with latency annotated on each hop. It shows the happy path and at least one failure path — what happens when a downstream component is slow or unavailable.
This diagram surfaces latency amplification: chains of sequential hops that look acceptable individually but add up to a P99 that breaks the latency budget. The critical path is the longest sequential chain.
The rule: Always show the failure path. A request flow diagram that only shows the happy path is incomplete. The failure path reveals whether the system fails gracefully (returns a degraded response) or fails catastrophically.
Example — API request for a user profile:
Client ──20ms──▶ CDN ──5ms──▶ Load Balancer ──2ms──▶ App Server
│
3ms ──▶ Cache (hit)
15ms ──▶ PostgreSQL (miss)
│
[failure path] │
DB slow ──▶ return stale cached profile
(not an error)
Happy path: ~45ms P99: ~120ms (cache miss)
The sequential chain from client to database is the critical path. The failure path shows the system degrades to stale data rather than returning an error — a deliberate design choice that must be visible on the diagram.
D2 — Data Storage
The question: Where is all the data? Who owns each store? What is the consistency model for each? What is the replication topology?
A data storage diagram shows every persistent store — databases, caches, object stores, queues with retention — with the service that owns each one and the consistency model it offers. It shows the read/write paths between services and stores.
This diagram surfaces data consistency failures (stores written by multiple services without a consistency protocol), split-brain risk (stores without clear leader election), and single points of failure (single-node stores on the critical path).
The rule: Every store must have one named owner. If two services write to the same store, that store is a coordination point that requires an explicit consistency model. Name it on the diagram.
Example — E-commerce data ownership:
Users Service ──────▶ PostgreSQL (strong, leader-follower)
Products Service ───▶ MongoDB (eventual, replica set)
Sessions Service ───▶ Redis (volatile, single node) ← FM1: SPOF
Redis is single-node and volatile — a single point of failure for session data. MongoDB's eventual consistency model means product reads may lag writes. These risks are invisible on a request flow diagram.
D3 — Event-Driven / Async Coupling
The question: What are the asynchronous dependencies? What produces events, what consumes them, and what is the ordering and delivery guarantee?
An event-driven diagram shows every message queue, event stream, or pub/sub channel, with producers, consumers, delivery guarantees (at-most-once, at-least-once, effectively-once), ordering guarantees, and dead-letter queue configuration.
Async coupling that is invisible on the request flow diagram is visible here. A system that looks simple from the synchronous request path may have complex async dependencies that produce data consistency issues, ordering failures, or unbounded queue depth.
The rule: Always annotate the delivery guarantee. "Events are consumed" is incomplete. "Events are consumed at-least-once, with consumer offsets committed after processing, and a dead-letter queue for unprocessable messages" is complete.
Example — Order processing fan-out:
Order Service ──▶ Kafka (Order Topic)
(500 msg/s) │
├──▶ Inventory Service (at-least-once)
├──▶ Payment Service (at-least-once)
└──▶ Notification Svc (at-most-once) ──▶ DLQ
Inventory and Payment use at-least-once delivery because they must not lose orders. Notification uses at-most-once because duplicate emails are worse than a missed one. The DLQ catches messages that fail after max retries. None of this is visible on the request flow or data storage diagrams.
D4 — Data Pipeline
The question: How does data flow from raw events to queryable results? What is the latency from event to availability? What is the consistency between pipeline stages?
A data pipeline diagram shows the ingestion layer, processing stages, storage layers, and the latency annotation between each stage. It also shows the schema at each stage boundary and the backfill and reprocessing capability.
This diagram surfaces schema contract violations between pipeline stages and silent data corruption from late-arriving events or incorrect aggregations.
The rule: Always annotate latency at each stage and schema at each stage boundary. A pipeline diagram without these annotations cannot be used to reason about freshness or correctness.
Example — Clickstream to analytics dashboard:
App DB ──1s──▶ CDC (Debezium) ──2s──▶ Kafka (raw) ──30s──▶ Flink (aggregates)
│
ClickHouse ──5s──▶ Grafana
Total event-to-dashboard: ~35–40s
Each stage is annotated. A user action takes roughly 35–40 seconds to appear on the dashboard. If the pipeline is backfilling, this can grow to hours. This staleness is invisible on D1 or D2.
D5 — Distributed Coordination
The question: Where does the system require multiple nodes to agree on something? What happens when they cannot agree? What is the quorum configuration?
A distributed coordination diagram shows every component that participates in leader election or consensus, the quorum configuration, the fencing or epoch mechanism that prevents split-brain, and the behaviour under partition — which side remains available, which side halts.
This diagram surfaces split-brain (systems that can produce two leaders simultaneously) and SPOF from insufficient quorum (a cluster configured with fewer replicas than needed to tolerate a failure).
The rule: Always show the partition case. A diagram that only shows the healthy case leaves the consistency and availability properties of the system under failure undefined.
Example — 3-node database cluster (Raft):
Normal:
Node A (Leader, epoch 42)
├──heartbeat──▶ Node B (Follower, ep:42)
└──heartbeat──▶ Node C (Follower, ep:42)
Partition:
[Node A isolated] ←─ partition ─▶ Node B (new Leader, ep:43) ──▶ Node C (Follower, ep:43)
Node A steps down.
B+C form majority (2 of 3). A's writes with epoch 42 are fenced by epoch 43.
The epoch (fencing token) prevents Node A from accepting writes after partition recovery. Without it, A and B could both accept writes simultaneously — split-brain.
Which Diagram for Which Problem
| If you need to know... | Draw this | Not this |
|---|---|---|
| Where latency comes from | D1 Request Flow | D2, D3, D4 — they do not show the synchronous request path |
| What happens if a disk fails | D2 Data Storage | D1 — it shows traffic, not data durability |
| What happens mid-message if a service crashes | D3 Event-Driven | D1 — it shows the synchronous path, not async coupling |
| How stale the analytics dashboard can be | D4 Data Pipeline | D1, D2 — they do not show pipeline freshness |
| What happens during a network partition | D5 Distributed Coordination | All others — none show the partition failure path |
The D5 diagram is the most commonly omitted and the most commonly needed in post-mortems.
In Practice
The most efficient way to audit a system is to draw all five diagrams from whatever documentation exists. Gaps in the diagrams are gaps in the design.
For a typical SaaS application:
| Diagram | What it reveals |
|---|---|
| D1 Request Flow | The API → service → database call chain; whether any external calls are in the critical path |
| D2 Data Storage | Which tables are shared between services (hidden coupling); whether the cache has a consistency model |
| D3 Event-Driven | The background job queue; whether it has a dead-letter queue and retry policy |
| D4 Data Pipeline | The analytics pipeline from events to dashboard; the lag between user action and visible metric |
| D5 Coordination | Whether the database has a leader-follower setup; what happens on leader failover |
None of these five diagrams can answer the other four's questions. All five together reveal the complete failure surface.