Exercises

Level 1 — Understand

Why do WebSockets replace HTTP polling for real-time chat? Name two specific limitations of polling that WebSockets eliminate.
What is the connection registry and what information does it store? Why is it necessary for routing messages between users on different chat servers?
How does the chat system guarantee message ordering within a conversation, and what mechanism allows a client to detect and recover missed messages after reconnecting?

Level 2 — Apply

A chat server holds 50,000 active WebSocket connections. The server restarts. Clients use exponential backoff starting at 1 second with a factor of 2, capped at 60 seconds, with ±50% jitter. (a) How many clients attempt reconnection in the first second? (b) After 3 backoff intervals, what is the range of reconnect times for a given client? (c) What FM code describes what happens if all 50,000 clients reconnect without backoff?
A conversation between users A and B has sequence numbers up to seq=100. A sends a message that is stored as seq=101. B’s client last received seq=98. When B reconnects, what sequence numbers does B request? If seq=99 and seq=100 were from B (A never received them), does A’s client need to sync too?

Level 3 — Design

A team collaboration tool supports group chats with up to 10,000 members per channel. A message sent to a 10,000-member channel must be delivered to all active members within 2 seconds. Design the message routing and fan-out architecture. How does fan-out to 10,000 members differ from 1:1 chat? What is the AT10 tradeoff in delivering to all members synchronously vs async? What FM12 risk exists when a member is connected to a different regional server than the message sender? How do you handle members who are offline?

A complete answer will: (1) design a server-to-server fan-out architecture where the sending user’s WebSocket server publishes the message to a pub/sub broker (e.g., Redis Pub/Sub or Kafka), and each regional server subscribed to that channel delivers the message to its locally connected members — contrast this with 1:1 chat where fan-out is always exactly one connection and no pub/sub coordination is needed, (2) name AT5 (Centralisation/Distribution) for the fan-out architecture decision: centralised fan-out through one server creates a bottleneck at 10,000 concurrent channel sends; distributed pub/sub scales but introduces inter-server message hops that add latency — state the latency budget impact and whether the 2-second target is achievable, (3) name FM12 (network partition) for the cross-region routing risk: if the pub/sub broker is partitioned from a regional server, members in that region miss messages during the partition window — specify whether the design chooses to drop messages (availability) or block delivery (consistency) and name the AT1 tradeoff, and (4) describe offline member handling: messages are persisted to a message store and delivered via push notification (mobile) or on next WebSocket connect (desktop) — specify the message ordering guarantee (e.g., sequence numbers or timestamps) and how the client reconciles missed messages on reconnect.

Read in the book →

← Real-World Variants