How It Works

Routing Algorithms

// Algorithm 1: Round-robin
// Each request goes to the next server in sequence.
// Simple. Assumes all servers are equally capable and equally loaded.
function round_robin(servers, request):
    index = request_counter % length(servers)
    request_counter += 1
    return servers[index]

// Algorithm 2: Least connections
// Each request goes to the server with fewest active connections.
// Better for heterogeneous workloads where requests have varying duration.
function least_connections(servers, request):
    return server with minimum active_connections in servers

// Algorithm 3: IP hash (sticky routing)
// Each request from the same client IP goes to the same server.
// Required when sessions are stored on the server (stateful services).
// Breaks down if client IP changes (mobile, NAT) or server fails.
function ip_hash(servers, request):
    index = hash(request.client_ip) % length(servers)
    return servers[index]

// Algorithm 4: Consistent hashing (from Book 2, Ch 15)
// Server membership changes reroute only K/N keys, not all keys.
// Used for distributed caches: same key always maps to same server,
// minimising cache invalidation when the pool changes.
function consistent_hash(ring, request):
    position = hash(request.key)
    return ring.get_node(position)  // next node clockwise on ring

Health Checks: The Feedback Loop

A load balancer without health checks is dangerous — it will continue routing to a failing server, sending requests into a black hole.

// Health check loop — runs continuously for each server
function health_check_loop(server, interval_seconds):
    while true:
        sleep(interval_seconds)
        result = probe(server, timeout=2s)
        if result == HEALTHY:
            server.state = HEALTHY
            server.consecutive_failures = 0
        else:
            server.consecutive_failures += 1
            if server.consecutive_failures >= 3:
                server.state = UNHEALTHY
                remove_from_pool(server)
                alert("Server " + server.id + " marked unhealthy")

// Probe types:
// TCP probe: can we establish a TCP connection? (Layer 4 check)
// HTTP probe: does GET /health return 200? (Layer 7 check)
// Custom probe: does the application report it is ready to serve?

// Hysteresis: re-adding a server requires N consecutive successes,
// not just one success, to prevent flapping.
function recovery_check_loop(server, interval_seconds):
    while server.state == UNHEALTHY:
        sleep(interval_seconds)
        result = probe(server, timeout=2s)
        if result == HEALTHY:
            server.consecutive_successes += 1
            if server.consecutive_successes >= 3:
                server.state = HEALTHY
                add_to_pool(server)

Avoiding the Load Balancer as a SPOF

The load balancer must not itself be a single point of failure. Two patterns:

// Pattern 1: Active-passive pair
// Primary handles all traffic. Secondary monitors primary.
// If primary fails, secondary takes over using a shared virtual IP (VIP).

function primary_health_monitor(primary, secondary, virtual_ip):
    while true:
        if primary is UNHEALTHY:
            reassign_virtual_ip(virtual_ip, to=secondary)
            secondary.state = ACTIVE
            alert("Failover: secondary now active")
        sleep(1s)

// Pattern 2: DNS-level load balancing
// Multiple A records for the same domain.
// Clients connect to different IPs.
// Simpler but DNS TTL means failover is slow (seconds to minutes).
// Used for geographic distribution, not fast failover.

// Pattern 3: Anycast routing
// Multiple servers announce the same IP prefix via BGP.
// Routers direct packets to the geographically nearest server.
// Used by CDNs and DNS providers for global scale.

Connection Draining

When a server is removed from the pool, existing connections should not be abruptly terminated:

function graceful_remove(server, drain_timeout_seconds):
    server.state = DRAINING
    // No new connections routed to this server
    // Existing connections are allowed to complete
    deadline = now() + drain_timeout_seconds
    while server.active_connections > 0 and now() < deadline:
        sleep(1s)
    // After deadline: forcibly close remaining connections
    server.state = REMOVED

Read in the book →

← The Concept Tradeoffs →