Real Systems

HAProxy is the most widely deployed open-source load balancer. It supports Layer 4 and Layer 7 routing, dozens of balancing algorithms, and health checks with configurable thresholds. It is used by GitHub, Stack Overflow, and Tumblr. Its architecture is single-threaded event-driven (like Nginx), which gives it extremely high throughput on a single core with low latency.

AWS Application Load Balancer (ALB) is a managed Layer 7 load balancer. It routes based on URL path, HTTP headers, query strings, and source IP. It integrates with AWS auto-scaling: as new instances are registered, they are automatically added to the routing pool. Health check failures automatically remove instances. The managed version eliminates the operational burden of maintaining load balancer redundancy.

Kubernetes’ kube-proxy implements service load balancing inside a cluster using iptables or IPVS rules. Every node runs kube-proxy, which maintains routing rules so that traffic to any service ClusterIP is distributed across the pods that back it. This is load balancing at the network layer inside the cluster, fully distributed — there is no central load balancer.

Concept: Load Balancing

Thread: T11 (Feedback Loops) ← benchmarking and measurement (Book 2, Ch 19) → autoscaler control loops (Book 3, Ch 22); T8 (Divide & Conquer) ← distributing load across N servers is the infrastructure form of divide and conquer

Core Idea: A load balancer distributes incoming requests across a pool of servers using a routing algorithm (round-robin, least-connections, consistent-hashing) and continuously measures server health via probes. The health check loop is a feedback system: unhealthy servers are removed from the pool; recovered servers are re-added after hysteresis. The load balancer must itself be redundant to avoid becoming the SPOF it was designed to prevent.

Tradeoff: AT6 — Generality vs. Specialisation (Layer 4 is fast and general; Layer 7 enables content-aware routing at the cost of per-connection parsing overhead — choose based on whether routing decisions need to inspect request content)

Failure Mode: FM1 — Single Point of Failure (an unreplicated load balancer is the single point of failure for every service behind it; active-passive failover or anycast routing eliminates this; the failure symptom is total simultaneous outage of all downstream services)

Signal: When multiple services go down simultaneously with no error logs — the load balancer or a shared network component has failed. When one server in a pool is consistently receiving 3–5× the load of others — the routing algorithm has a hotspot. When P99 latency spikes after a deployment and then recovers — a server was briefly unhealthy and was not removed from the pool fast enough.

Maps to: Reference Book, Framework 8 (Infrastructure Components — Load Balancer), Framework 3 (Failure Modes — Single Point of Failure)

Read in the book →

← Where It Fails Exercises →