Message Queues: From FIFO to Brokers in Production

On December 9, 2013, a bug in a third-party payment processor caused 20,000 payment-confirmation requests to hit an e-commerce platform's notification service simultaneously. The service was designed to process one request at a time, had no queue, ran out of threads, refused connections, and crashed. The payment processor's retry mechanism sent the 20,000 requests again. The service crashed again. Four hours later, the retry queue had grown to 200,000 entries, no confirmation emails had been sent, and the notification service was unavailable for all traffic — not just the burst.

The fix was a message queue: a buffer between the payment processor and the notification service. Requests are written to the queue instantly. The notification service reads at whatever rate it can sustain. If the producer writes faster than the consumer can process, the queue absorbs the difference. The queue does not solve the problem of processing 20,000 emails — those still need to be sent — but it converts a synchronous overload crash into an asynchronous processing backlog. The consumer clears it at its own pace.

This is the shape of every message queue. The shape is the same as the FIFO queue you built in your first data structures course. The scale and the consequences are different.

What a Message Queue Actually Is

A message queue is a durable, ordered buffer that decouples producers from consumers. Producers write messages without waiting. Consumers read at their own pace. That is the entire idea.

The decoupling is what matters. When a payment service calls a notification service synchronously and waits for a response, the notification service's availability and latency directly affect the payment service. If the notification service is slow, the payment service is slow. If the notification service is down, the payment service is down. A message queue breaks this coupling: the payment service writes a message and returns. The notification service processes the message later. Their execution is temporally decoupled.

This is AT10 (Synchronous vs Asynchronous) in action. The queue removes the producer's dependency on the consumer's availability and latency. The cost is that the producer cannot know the outcome of the consumer's processing. For operations where the caller needs an immediate result — did the payment succeed? — a queue is not the right tool. Queues are appropriate for operations where fire-and-forget is an acceptable contract.

Delivery Semantics: The Choice You Must Make

Every message queue forces an explicit choice about what guarantee it provides about delivery. There are three options:

Semantic	Guarantee	Cost	Use case
At-most-once	Delivered 0 or 1 times	Possible message loss	Metrics, logs (loss is acceptable)
At-least-once	Delivered 1 or more times	Possible duplicates	Payments, orders (loss not acceptable; duplicates handled by idempotency)
Exactly-once	Delivered exactly 1 time	Highest cost; transactional producers + idempotent consumers	Financial ledgers

The implementation difference between at-most-once and at-least-once is one line of code: when does the consumer commit its offset?

// At-most-once: commit BEFORE processing.
// If consumer crashes after commit but before processing: message lost.
function consume_at_most_once(group, partition_id):
    messages = consume(group, partition_id)
    commit_offset(group, partition_id, messages[-1].offset + 1)
    for m in messages:
        process(m)   // may never complete

// At-least-once: commit AFTER processing.
// If consumer crashes after processing but before commit: message redelivered.
function consume_at_least_once(group, partition_id):
    messages = consume(group, partition_id)
    for m in messages:
        process(m)   // if crash here, redelivered after restart
    commit_offset(group, partition_id, messages[-1].offset + 1)

Exactly-once is the same as at-least-once with an idempotency check: a consumer that sees the same message twice produces the same result. The idempotency key is usually a business-level identifier (payment_id, order_id) stored in a table the consumer checks before processing:

function process_payment_idempotent(message):
    payment_id = message.value.payment_id
    if database.exists("processed_payments", payment_id):
        return   // already processed; skip safely
    database.record_payment(payment_id, message.value)
    database.mark_processed("processed_payments", payment_id)

The check and the write must be in the same transaction. Otherwise the failure mode is silent duplication.

Kafka: How the Modern Queue Is Built

Kafka is the dominant distributed message queue. It differs from traditional queues in two structural ways. First, messages are not deleted after consumption — they are retained for a configurable period (default 7 days). Second, multiple independent consumer groups can each read the entire topic at their own pace.

The model has three primitives: topics → partitions → consumer groups.

A topic is a named log.
A topic has N partitions, each an ordered, append-only log of messages.
A consumer group reads a topic; each partition is assigned to exactly one consumer in the group.

Producers write messages with an optional partition key. If a key is provided, the producer hashes it and routes to a deterministic partition: partition = hash(key) % num_partitions. All messages with the same key land on the same partition. If no key is provided, messages round-robin across partitions.

This is where the ordering guarantee lives: Kafka guarantees ordering within a partition, not across partitions. If your use case is a user event stream where events for a user must be processed in order, the partition key must be user_id. Use order_id and the events for a single customer will scatter across partitions, with no ordering guarantee between them.

Queue vs Direct Call: When to Pick Which

The decision rule is simple. Use a queue when:

The producer can tolerate an asynchronous response (fire and forget).
The consumer may be slow or unavailable.
Traffic is bursty — the queue absorbs peaks.
Multiple consumers need the same message (fan-out via consumer groups).

Use a direct call (RPC, HTTP) when:

The producer needs an immediate response.
The operation is part of a transaction (must succeed or fail atomically).
Latency is critical and queuing adds unacceptable delay.

The split shows up clearly in an order flow: payment is synchronous (must succeed before order is confirmed), notification is asynchronous (does not need to complete before the user sees the confirmation).

function place_order(user_id, items, payment_info):
    payment_result = payment_service.charge(user_id, payment_info, total)
    if not payment_result.success:
        return OrderFailed(reason=payment_result.error)

    order = database.create_order(user_id, items)
    queue.produce("order_placed", key=order.id, value={
        "order_id": order.id, "user_id": user_id, "email": user.email
    })
    return OrderConfirmed(order_id=order.id)
    // Response returned before email is sent.

Where Queues Fail

FM3 — Unbounded Resource Consumption. A queue can grow without bound when consumers process more slowly than producers write. A Kafka queue depth of 10 million messages is not inherently a problem — Kafka is designed for retention. But the time for a consumer to catch up is proportional to the backlog. A notification service that is 10 million messages behind will take hours to catch up at normal processing rates. During that window every notification is late. Backpressure — the mechanism by which consumers signal producers to slow down — is not built into Kafka. Producers write at maximum speed regardless of consumer lag. Monitoring consumer lag (the offset difference between the latest produced message and the latest committed consumer offset) and alerting when lag exceeds a threshold is the primary defense.

FM8 — Schema/Contract Violation. A Kafka topic persists messages for days or weeks. Consumers reading those messages may run different code versions than the producer. When the message schema changes — field added, renamed, removed — older consumers may fail to deserialise newer messages, or vice versa. Without a schema registry and compatibility enforcement, schema changes silently break consumers. The failure mode is delayed: the consumer processes all messages until it hits the first message produced by the new producer version, then fails on every subsequent message.

The Three Real Systems Engineers Use

Apache Kafka is the standard for high-throughput distributed messaging. LinkedIn built it in 2011 to handle activity stream data at a volume traditional queues could not sustain. Retention, consumer group rebalancing, sequential log append architecture. Millions of messages per second per cluster.

Amazon SQS is a managed queue providing at-least-once delivery. Unlike Kafka, SQS does not retain messages after consumption, does not support multiple independent consumer groups reading the same queue, and does not guarantee strict ordering (FIFO queues are a separate product with lower throughput). Use it when the simplicity of not managing Kafka is worth more than Kafka's advanced features.

RabbitMQ implements AMQP and supports complex routing topologies: messages flow from exchanges to queues based on routing keys and bindings. A single exchange can route to multiple queues by pattern — fan-out, direct delivery, topic-based routing — in one broker. Used when routing flexibility matters and you do not need Kafka-scale throughput.

The Thread

The shape of a queue is the same in three places: the FIFO data structure you built in Book 1, the topological-sort traversal in Book 2 that processes dependencies in order, and the Kafka partition in production. The mathematics is identical. The scale and the consequences are different. This is T4 (Queues) — one of the twelve recurring threads across The Computing Series.

This article extracts the core of Book 3, Chapter 13 — Message Queues. The chapter covers the full Kafka architecture (consumer group rebalancing, offset commit protocols), the cost of exactly-once semantics in distributed transactions, the dispatch-service stream-join pattern, the analytics micro-batch protocol, and the worked design problem for a ride-sharing event pipeline — including the exact partition-count math for a 1-second processing SLA.

Read Book 3, Chapter 13 →