The Computing Series

Example: ADR for Choosing a Message Queue

The four-field structure described above is easy to agree with in principle. In practice, teams struggle with the right level of detail — too terse and the reasoning disappears; too verbose and the decision disappears. Here is a populated ADR that demonstrates the target level of precision.


ADR-0047: Adopt Apache Kafka for Inter-Service Communication

Context: The current system uses synchronous REST calls between services. During peak load (Black Friday, flash sales), downstream service latency spikes propagate upstream through the call chain, producing cascading failures (FM2). Three incidents in the past quarter were caused by the payment service slowing under load, which blocked the order service, which blocked the API gateway, which returned 503s to users. The synchronous coupling means every service is as slow as the slowest service in its call chain.

Decision: Replace synchronous REST calls between services with asynchronous communication through Apache Kafka. Services publish events to Kafka topics; consuming services read from those topics independently. The order service publishes order.created events; the payment, inventory, and notification services each consume from that topic with independent consumer groups. Delivery guarantee: at-least-once with idempotent consumers. The tradeoff accepted is AT1 (Consistency vs Availability) — we move from strong consistency (synchronous request-response confirms the downstream operation completed) to eventual consistency (the downstream operation will complete, but not necessarily before the API response). We also accept AT10 (Operational Simplicity vs Resilience) — Kafka adds operational complexity (cluster management, partition rebalancing, offset tracking) in exchange for eliminating the cascading failure surface.

Consequences: Services are decoupled in time — a slow payment service no longer blocks order creation. Kafka’s retention enables event replay, which allows new consumers to be added without re-publishing historical events. The system becomes eventually consistent: a user who places an order will not see payment confirmation in the same response. The team must learn Kafka operations (broker management, consumer lag monitoring, partition strategy). Idempotency in every consumer is now a hard requirement, not a nice-to-have.

Status: Accepted. Owner: Platform Team (J. Patel). Date: 2025-11-14. Reversal condition: If Kafka operational overhead exceeds 20% of platform team capacity, or if eventual consistency produces user-facing confusion that degrades NPS below threshold, revisit with a managed queue service (SQS) or reintroduce synchronous calls for latency-sensitive paths only.


The ADR names both the tradeoff being accepted (AT1, AT10) and the condition under which the decision should be revisited. An engineer reading this eighteen months later knows not just what was decided, but why — and when to reopen the conversation.

Concept: The Language of Architecture Decisions

Thread: T12 (Tradeoffs) ← implicit tradeoff → named, documented, reversible

Core Idea: Technical disagreements are almost always disagreements about unnamed tradeoffs; naming the tradeoff changes the conversation from correctness to context-appropriateness.

Tradeoff: AT6 — ADR completeness vs simplicity; the useful ADR names the tradeoff and the reversal condition

Failure Mode: FM8 — contract violation; downstream teams infer guarantees from undocumented decisions and break when the system changes

Signal: When the same design decision gets relitigated more than once — the original decision did not name its tradeoff; write the ADR now

Maps to: Book 0, Framework 4

Read in the book →