The Options and Tradeoffs

Engineering investment decisions have a standard economic structure. The investment cost is the engineering weeks to build, plus the infrastructure cost to operate, plus the maintenance overhead over the system’s lifecycle. The return is the value generated — either directly (revenue, cost reduction) or indirectly (velocity improvement, risk reduction). The time to return is the lag between investment and value.

The cost of operating systems is systematically underestimated because it is distributed across multiple cost centres that are rarely aggregated. Compute and network costs are visible in the cloud bill. People costs — on-call load, incident response, maintenance time, the cognitive overhead of understanding a complex system — are not on the cloud bill but are often larger than the infrastructure costs.

The board presentation has a three-number structure: current cost, projected cost at 2x scale, cost of investment required to change the cost curve. A streaming infrastructure investment that costs fourteen engineer-weeks and reduces batch processing infrastructure cost by forty percent per month at 2x data volume has a breakeven point, and that breakeven point is calculable. The calculation is not complex. Presenting it changes the conversation from “should we invest in streaming?” to “when does the streaming investment pay off, and do we believe we will reach 2x scale within that horizon?”

AT2 (Latency vs Throughput) is an economic decision at scale. Lower latency typically requires more infrastructure — more replicas, more cache capacity, more compute. Higher throughput without latency constraints is cheaper per unit. When the business model requires low latency — a trading system, a real-time recommendation engine — the economic cost of latency is a product requirement, not an engineering preference.

Read in the book →

← The Forces at Play What Great CTOs Do →