Distributed Locks: How They Work and Why They Fail

Two services. One shared resource. No coordination. The result is not a race condition — it is a guarantee of one.

A distributed lock is the primitive that turns "two processes might collide" into "exactly one process runs at a time." It is one of the most commonly needed and most commonly broken mechanisms in distributed systems. Engineers reach for it instinctively and implement it incorrectly in ways that do not fail loudly — they fail silently, intermittently, and in production.

The Problem a Distributed Lock Solves

In a single-process system, a mutex works. One thread acquires it. Others wait. The operating system enforces the exclusion at the hardware level.

Distributed systems have no shared hardware. Two services running on two machines have no common memory, no common clock, and no way to coordinate directly. Yet they often need to coordinate: one cron job should run, not two. One write to a shared record should proceed, not a concurrent pair that overwrites each other. One payment should process, not a duplicate that charges the user twice.

The distributed lock solves this by introducing a third party — a coordination service — that both processes agree to consult before acting.

How a Distributed Lock Works

The core protocol is simple:

  1. Process A writes a key to a coordination service (Redis, etcd, ZooKeeper) with a short TTL
  2. If the write succeeds, Process A holds the lock
  3. Process B tries the same write — it fails because the key exists
  4. Process B waits and retries
  5. When Process A finishes, it deletes the key
  6. Process B's next retry succeeds

The TTL is the safety valve. If Process A crashes before deleting the key, the lock expires automatically. Without the TTL, a crashed process holds the lock forever.

Process A                   Redis                   Process B
   │                           │                          │
   │─── SET lock:job NX PX 5000 ─▶│                       │
   │◀── OK ─────────────────────│                         │
   │  (holds lock)             │                          │
   │                           │─── SET lock:job NX PX 5000 ◀─│
   │                           │──── (nil) ──────────────▶│
   │                           │  (key exists, fails)     │
   │  ... do work ...          │                          │
   │─── DEL lock:job ──────────▶│                         │
   │                           │                          │
   │                           │─── SET lock:job NX PX 5000 ◀─│
   │                           │──── OK ─────────────────▶│
   │                           │                          │  (holds lock)

The Redis command SET key value NX PX ttl does this atomically. NX means "only if not exists." PX ttl sets the expiry in milliseconds. One round trip. One atomic operation. No race condition on the acquisition itself.

Where It Fails

1. Expiry before work completes — and why renewal is not enough

Process A acquires a lock with a 5-second TTL. Its work takes 7 seconds due to a slow database query. At second 5, the lock expires. Process B acquires it. Both processes now believe they hold the lock and proceed concurrently.

Lock renewal (periodically extending the TTL) helps but does not eliminate this. A garbage-collection pause, an OS scheduling preemption, or a network hiccup can all pause Process A for longer than the renewal interval. The root problem is that the storage layer receiving the writes has no way to distinguish A's writes from B's writes.

The correct mitigation is fencing tokens. When a lock is acquired, the lock service issues a monotonically increasing token. Every write to the protected resource must include this token. The storage layer rejects writes whose token is lower than the highest token it has seen:

A acquires lock → token=7
A pauses 11 seconds → lock expires
B acquires lock → token=8
B writes result (token=8) → accepted
A resumes, writes result (token=7) → REJECTED (8 > 7)

Fencing tokens shift the safety guarantee from the lock service to the storage layer. Even if the TTL expires mid-operation, stale writes are rejected where it matters most.

2. Deleting someone else's lock

Process A holds a lock. The TTL expires. Process B acquires the lock. Process A finishes and deletes the key — deleting Process B's lock.

Mitigation: Include a unique token in the lock value. Before deleting, verify the token matches. Use a Lua script in Redis to make the check-and-delete atomic:

if redis.call("get", KEYS[1]) == ARGV[1] then
  return redis.call("del", KEYS[1])
else
  return 0
end

3. Network partition

The coordination service becomes unreachable. Process A cannot acquire or release locks. The system halts — or worse, both processes proceed without coordination because they assumed "can't reach Redis" means "safe to proceed."

Mitigation: Fail closed. If the coordination service is unreachable, do not proceed. The alternative — proceeding without a lock — is the exact scenario the lock was designed to prevent.

4. Clock skew

Distributed systems do not share a clock. Two machines' clocks can drift apart by tens of milliseconds or more. A lock that relies on wall-clock time for TTL calculation can expire earlier on one machine than expected.

Mitigation: Use monotonic time locally for lock duration. Let the coordination service manage expiry — its clock is the only one that matters for TTL.

ZooKeeper: When You Need Session-Based Locking

Redis locks have a fundamental design tension: the TTL is an approximation of how long the holder needs. Too short and it expires mid-operation; too long and a crashed holder blocks the system for minutes.

ZooKeeper sidesteps this with ephemeral nodes — nodes that are automatically deleted when the client session that created them ends. A crashed process has its session expire (within the configured session timeout, typically 4–40 seconds), and ZooKeeper deletes the lock automatically. No TTL to miscalibrate.

Process A creates /locks/job-42/lock-0000000001 (ephemeral)
A holds the lock — it has the lowest sequence number
B creates /locks/job-42/lock-0000000002 (ephemeral)
B watches A's node

A crashes → session expires → node deleted → B acquires lock

The cost: ZooKeeper requires running a quorum cluster (minimum 3 nodes, typically 5 in production) and is a serious operational commitment. For most applications, Redis with fencing tokens is sufficient. ZooKeeper is appropriate when lock correctness is a hard safety requirement — financial transactions, database record updates, payment deduplication.

Redlock: When One Redis Instance Is Not Enough

A single Redis instance is a single point of failure. If it goes down, all lock acquisition fails. For high-availability scenarios, the Redlock algorithm acquires a lock on a majority of N independent Redis instances (typically 5). The lock is held only if acquisition succeeds on at least (N/2 + 1) instances within a time bound.

Redlock is controversial. Martin Kleppmann's analysis identified scenarios where it fails under process pauses and clock drift. For most applications, a single Redis instance with replication is sufficient. Redlock is worth the complexity only when lock correctness is a hard safety requirement, not a performance optimization.

When You Don't Need a Lock: CAS

Many workloads do not need a lock if write contention is low. Compare-and-swap (CAS) is an atomic operation that updates a value only if it currently matches an expected value. The storage system provides the atomicity — no external lock service required.

Read record: {version: 5, balance: 100}
Compute new state: {version: 6, balance: 90}
Write with condition: "only if version == 5"
  → Success: we were the only writer
  → ConditionalCheckFailed: someone else wrote first — retry

DynamoDB's ConditionExpression, PostgreSQL's WHERE version = $1, and Redis's WATCH/MULTI/EXEC all implement this. When conflicts are rare, CAS is faster than acquiring a distributed lock. When many workers compete for the same record simultaneously, CAS degrades — every writer retries repeatedly, amplifying load.

The practical rule: use CAS when conflicts are rare and retries are cheap. Use distributed locks when conflicts are common and wasted work is unacceptable.

Distributed Locks vs. Transactions

A distributed lock is not a transaction. A transaction ensures atomicity across multiple operations on one database. A distributed lock ensures mutual exclusion across multiple processes. They solve different problems.

Using a distributed lock to simulate a transaction is a mistake that produces correctness gaps. Using a transaction where a distributed lock is needed is a mistake that produces unnecessary coupling to one database.

The Right Use Cases

Distributed locks fit a specific pattern: one operation per unit of time, enforced across multiple processes. Common examples:

  • Cron jobs that should not run concurrently
  • Cache warming tasks that should run once when cache is cold
  • Payment processing where double-execution causes real harm
  • Leader election where one node should coordinate work

They do not fit: rate limiting (use a counter with TTL instead), read access control (use permissions instead), or enforcing data integrity across records (use transactions instead).