In 2012, Instagram had thirteen engineers and 30 million users. In 2014, WhatsApp had 55 engineers and 600 million users. Neither company had an unusually talented team. Both had designed systems that scaled. The engineers who built them were not doing different work — they were making different decisions: which constraints to impose, which tradeoffs to accept, which failure modes to plan for.
Scale is not a property of a system. It is a constraint that the system must satisfy. When that constraint tightens — more users, more data, more requests per second — systems that were designed without it in mind do not slow down gracefully. They break categorically.
This chapter is about what that constraint actually is, how to measure it, and how to reason about it before it becomes a production incident.