Scale has three independent axes:
Throughput: the number of requests a system can handle per unit time. Measured in requests per second (RPS), transactions per second (TPS), or messages per second. A system that handles 1,000 RPS at normal load may not handle 10,000 RPS without architectural changes.
Latency: the time to handle one request, from initiation to completion. Measured in milliseconds. P99 latency (the 99th percentile) is the metric that matters: a system with 5ms mean latency and 2,000ms P99 latency is not a 5ms system.
Storage: the volume of data the system manages. Measured in gigabytes or terabytes. Storage scale affects read/write patterns, index size, and replication cost.
These three axes are independent but interact. Increasing throughput without changing latency requires adding capacity. Decreasing latency often requires more memory (caching) or more compute. Scaling storage requires decisions about consistency and replication that affect both throughput and latency.
The three axes define the scaling triangle. Any architecture occupies a point in this triangle. Moving the point requires tradeoffs.