Designing a Notification System: Fan-Out at Scale

A notification is a promise. When a user takes an action — places an order, receives a message, gets mentioned in a comment — the platform promises to tell them. Break that promise silently and nobody notices. Keep it twice and the user is annoyed. Keep it at 3 AM when they have quiet hours enabled and the user is angry. And when the promise must travel through four channels — push, email, SMS, in-app badge — each with its own third-party integration, failure mode, and rate limit, the simplicity of "send a notification" evaporates entirely.

The notification system is one of the most integration-heavy components in any platform. Its architecture is dominated by one shape: fan-out.

Why the Obvious Design Fails

The naive approach sends notifications synchronously inside the request handler that triggered the event. An order is placed; the handler calls APNS, sends an email, sends an SMS, then returns to the client. This fails three ways.

First, third-party API calls add hundreds of milliseconds to the request — users experience slow order placement because of notification delivery. Second, if APNS is temporarily down, the order request itself fails — a downstream notification failure has propagated into the primary business operation. Third, synchronous delivery leaves no layer to intercept and hold a notification, so rate limiting and quiet-hours logic have nowhere to live.

The Event-Driven Architecture

The fix decouples the notification from the thing that triggered it. The Order Service publishes an event to Kafka and returns to the client immediately. A separate Notification Service consumes those events asynchronously and fans out to channel-specific workers — a push worker calling APNS/FCM, an email worker calling SMTP, an SMS worker calling Twilio. A failure in notification delivery can no longer block an order confirmation.

This is AT10 (Synchronous vs Asynchronous) chosen deliberately: async delivery decouples notification failures from primary operations. The cost is end-to-end latency — for a non-priority notification, the path event → Kafka → notification service → channel queue → channel worker → APNS can take minutes if any hop is slow.

Fan-out — one event becomes many deliveries

  Order Service
       │ publish event
       ▼
   [ Kafka ]
       │
  Notification Service ── fetch prefs + devices (on-demand)
       │
       ├──▶ push worker  ──▶ APNS   ──▶ iPhone
       ├──▶ push worker  ──▶ FCM    ──▶ iPad
       ├──▶ email worker ──▶ SMTP   ──▶ inbox
       └──▶ sms worker   ──▶ Twilio  ╳  (not in prefs)

One Event, Many Deliveries

The fan-out is not one event to one notification. It is one event to many deliveries. A user with an iPhone and an iPad receives a push on both. If they also want email, that is a third delivery. Channel routing resolves it:

Fetch user preferences: [push_ios, email] — no SMS.
Fetch user devices: [iPhone-A, iPad-B].
Generate deliveries: push→iPhone-A, push→iPad-B, email→address.
Enqueue each delivery to its channel worker.

Preferences and device tokens are fetched at delivery time, not embedded in the event — that is AT4 (Precomputation vs On-Demand). If a user changes preferences between event publication and delivery, the on-demand fetch gets the current value; an event carrying a stale preference snapshot would not.

At-Least-Once, Without Duplicates

Push APIs do not guarantee delivery — devices go offline, tokens expire, networks time out. At-least-once delivery means retry on failure, with exponential backoff and a maximum retry count. But a retry must not deliver the notification twice. Each delivery carries a globally unique notification ID; the channel worker checks that ID (an idempotent check in Redis) before sending and skips if it was already delivered.

Two more constraints ride on top. Rate limiting — per user, per type, per window — stops a cascading event from dumping 500 notifications on one person in an hour. Quiet hours — non-urgent notifications are batched and held until the window opens, while CRITICAL notifications (fraud alerts, 2FA codes) bypass both quiet hours and rate limits entirely.

Where It Fails: The Thundering Herd

The signature failure of a notification system is FM7 (Thundering Herd). A platform-wide event — a popular product launches, a major news event, an outage that itself cascades into alerts — triggers millions of notifications simultaneously. Worker queues overflow; the SMS gateway rate-limits your entire platform. The defence is priority queuing: critical notifications are processed first, and backpressure on non-critical queues stops them starving the critical ones.

Two failures travel with it. FM3 (Unbounded Resource Consumption) — if APNS is down for hours, retry queues grow without limit, each retry multiplying queue depth; dead-letter queues with a maximum retry count bound the growth. And FM5 (Latency Amplification) — the multi-hop async path means a critical notification can take minutes, which is unacceptable for a 2FA code, so critical notifications need a synchronous fast path that bypasses the async pipeline.

Real Systems

Firebase Cloud Messaging is the dominant Android push service, offering topic-based fan-out — one send to millions of subscribers — and handling token management and retry. AWS SNS is a managed fan-out service supporting APNS, FCM, SMTP, SMS, and HTTP endpoints from a single API, with built-in retry and dead-lettering. Twilio abstracts global SMS carrier integrations and rate-limits per country, since SMS throughput limits differ by jurisdiction.

The One Sentence

A notification system is one triggering event fanning out to many channel deliveries, and every hard part — async decoupling so a dead push service cannot fail an order, on-demand preference fetch so deliveries are never stale, deduplication so at-least-once never means twice, and priority queuing so a thundering herd cannot bury a fraud alert — exists because "send a notification" was never actually simple.

Concept: A notification system is one triggering event fanning out to many channel-specific deliveries.

Core Idea: An event-driven pipeline — publish to Kafka, consume asynchronously, resolve preferences and devices on-demand, deduplicate by notification ID, enforce priority.

Tradeoff: AT10 — Synchronous vs Asynchronous: async delivery decouples notification failure from the order, at the cost of multi-hop end-to-end latency.

Failure Mode: FM7 — Thundering Herd: a platform-wide event triggers millions of notifications at once, overflowing worker queues.

Signal: When a delivery path can dump 500 notifications on one user, it needs rate limiting, quiet hours, and priority queuing.

Series: Book 4, Ch 15