The Computing Series

Real Systems

Netflix Hystrix was the circuit breaker library that popularised the pattern in microservices architectures. Developed at Netflix in 2012, Hystrix wrapped every remote call in a circuit breaker with configurable failure thresholds, fallback functions, and a real-time dashboard (the Hystrix Dashboard) that showed circuit state across all services. Hystrix is now in maintenance mode — Netflix migrated to resilience4j — but its design influenced every subsequent circuit breaker implementation.

resilience4j is the modern Java circuit breaker library. It implements sliding-window failure rate (count-based or time-based windows), configurable slow-call rate thresholds (not just errors — slow responses also trip the circuit), and a Bulkhead implementation using either semaphores or a separate thread pool. resilience4j integrates with Spring Boot Actuator to expose circuit state as health endpoints and Micrometer metrics.

Polly is the .NET equivalent: a resilience and transient-fault-handling library that provides circuit breakers, retry policies, bulkheads, and fallbacks as composable policy objects. Polly policies are stacked: a call can be wrapped in a retry policy (retry on transient errors) inside a circuit breaker (stop retrying if failure rate exceeds threshold) inside a bulkhead (limit concurrency). The composition of policies enables fine-grained control over the failure behaviour of each remote call.


Concept: Circuit Breakers

Thread: T7 (State Machines) ← Book 1, Ch 29 (State Machine Transitions) → Book 5, Ch 11 (Resilience Patterns)

Core Idea: A circuit breaker is a 3-state machine that fast-fails requests to a failing dependency, preventing thread pool exhaustion and cascading failures while allowing automatic recovery detection.

Tradeoff: AT7 — Automation/Control (circuit breaker automates isolation and recovery detection; threshold miscalibration causes false trips on transient failures or insufficient protection against slow dependencies)

Failure Mode: FM11 — Observability Blindness (a circuit breaker that opens without alerting hides the downstream failure — features stop working silently with no visible error in the upstream service)

Signal: When a service’s error rate increases and P99 latency simultaneously drops (errors are being returned faster than before), a circuit breaker has opened upstream — fast-failing is replacing slow timeouts.

Maps to: Reference Book, Framework 2 (Failure Modes) and Framework 7 (Automation/Control)


Read in the book →