The Load Balancer: What It Guarantees and What It Doesn't

The request was load-balanced successfully. Round-robin selected server three. The HTTP response came back 200 OK. The user saw a spinner for eight seconds, then an error. Server three had passed its health check four seconds earlier. Its database connection pool was exhausted. The health check endpoint never touches the database.

The load balancer did its job. The user still got a broken experience. These two facts are not in conflict.

What a Load Balancer Actually Does

A load balancer sits in front of a pool of servers. Requests arrive at the balancer's IP address. The balancer picks one server and forwards the request. The response travels back through the balancer to the caller.

Client
  │
  ▼
[Load Balancer]
  │        │        │
  ▼        ▼        ▼
Server 1  Server 2  Server 3

The balancer distributes requests. That is the complete guarantee. It does not inspect what the server does with the request. It does not verify that the response is correct. It does not know if a server is processing requests slowly, returning corrupt data, or failing on a specific class of request.

A load balancer distributes requests. It does not distribute correctness.

The Four Algorithms

Round-robin cycles through servers in order. Simple and predictable. Works well when requests have similar cost. Breaks down when some requests are cheap and some are expensive — a server can be overwhelmed while round-robin keeps sending it more.

Least connections sends each new request to the server with the fewest active connections. Better for variable-cost requests. The balancer must track connection counts in real time.

IP hash maps the client's IP address to a specific server. The same client always hits the same server — useful when session state lives on the server. The tradeoff: uneven distribution if a few IPs generate most traffic (FM6 — Hotspotting).

Consistent hashing places both servers and requests on a ring. A request goes to the next server clockwise from its hash position. When a server joins or leaves the pool, only the requests that mapped to that server are redistributed — not all requests. This is the algorithm behind distributed caches: add a cache node and only a fraction of keys move, minimising cache invalidation. IP hash redistributes everything when the pool changes; consistent hashing does not.

Round-Robin:
  R1 → S1, R2 → S2, R3 → S3, R4 → S1 ...

Least Connections:
  S1: 10 conns, S2: 3 conns, S3: 7 conns
  Next request → S2

IP Hash:
  Client 1.2.3.4 → always S2 (redistributes all if pool changes)

Consistent Hash:
  Ring: S1──S2──S3──(wrap)
  Key hash lands between S1 and S2 → routes to S2
  Add S4: only keys between S3 and S4 move; all others unchanged

Health Checks and Their Limits

Load balancers probe their server pool continuously. A typical health check sends an HTTP GET to /health. The server returns 200. The balancer marks it healthy. The server receives traffic.

The problem: health checks test what they test and nothing else.

A /health endpoint that returns 200 in ten milliseconds tells the balancer the server can handle HTTP connections. It says nothing about whether the server can reach its database. It says nothing about whether the server's background job queue is full. It says nothing about whether the server handles the specific request type the user is about to send.

A server can pass a health check and fail every real request. The health check tested a different code path.

Deeper health checks help — checking database connectivity, cache reachability, disk space. But they add latency and complexity. They also create a new failure mode: a health check that is too strict takes servers out of rotation unnecessarily. The balance between useful and excessive is a tuning problem, not a solved one.

L4 vs L7: What Gets Read

A Layer 4 load balancer operates at the TCP level. It sees source IP, destination IP, and port. It forwards TCP connections without reading the HTTP content inside. Fast. Low overhead. No understanding of what the connection carries.

A Layer 7 load balancer reads HTTP. It sees the URL path, headers, cookies, and body. This enables routing by path — /api/ to one server pool, /static/ to another. It enables header-based routing — requests with a specific cookie go to a canary deployment. It enables TLS termination at the balancer, so backend servers communicate in plain HTTP.

L4 (TCP):
  Sees: src IP, dst IP, port
  Routes by: IP and port only
  Cost: minimal CPU

L7 (HTTP):
  Sees: URL, headers, cookies, body
  Routes by: content rules
  Cost: parse overhead per request

L7 adds capability and overhead. L4 is simpler and faster. Choosing between them is AT6 — general capability vs specialized performance.

Sticky Sessions: Where Stateless Breaks Down

Some applications store session state on the server. A user logs in on server one. Server one holds the session. If the next request hits server two, the user appears logged out.

Sticky sessions solve this. IP hash or a session cookie pins each user to one server. The load balancer sends that user's requests to the same server every time.

This is FM6 — Hotspotting. If one server holds sessions for 30% of users (because their IPs hash the same way), that server gets 30% of all traffic regardless of what round-robin or least-connections would do. The distribution is no longer even.

It also breaks scaling. Add a server to handle load and existing users stay pinned to the original servers. The new server sits underutilised until new users arrive. The load does not rebalance.

The correct fix is to store session state outside the server — in a cache or database shared by all servers. Then any server can handle any request. Sticky sessions are a workaround for a stateful server design, not a feature.

The Balancer Is Also FM1

The load balancer sits in front of everything. If it goes down, everything goes down. The balancer that protects you from single points of failure is itself a single point of failure.

The standard answer is a pair of balancers in active-passive or active-active configuration. A virtual IP floats between them. If the primary fails, the secondary takes over. This reduces FM1 risk on the balancer itself, at the cost of added configuration and a more complex failover path.

          [Virtual IP]
               │
        ┌──────┴──────┐
        ▼             ▼
  [Balancer A]  [Balancer B]
  (active)      (standby)
        │             │
        └──────┬──────┘
               ▼
         Server Pool

Even with paired balancers, the virtual IP handoff takes seconds. Requests in flight during failover drop. FM1 is reduced, not eliminated.

What the Guarantee Covers

A load balancer is a distribution mechanism. It gives you horizontal scale — add servers, distribute load. It gives you basic fault isolation — a dead server stops receiving requests after health check failure. It gives you a single ingress point that the rest of your network need not know about.

It does not give you correct behavior from your servers. It does not give you fast responses if your servers are slow. It does not give you session state if your servers are stateful. It does not give you protection from a bug that affects all servers simultaneously.

Knowing what a tool guarantees is how you know where the gaps are. The gaps are where incidents happen.