The Tradeoffs

The tradeoff between model size and deployment constraints is pervasive. Teams that evaluate AI products using large API-hosted models often discover that the model they want to deploy cannot meet their latency requirements, cost requirements, or data privacy requirements when running at scale in their own infrastructure. The evaluation model and the production model are different systems, and the gap between them requires engineering investment.

Evaluation infrastructure has a cost. Building evaluation pipelines, collecting human labels, and running statistical quality checks requires significant engineering investment before it produces value. Teams that skip evaluation ship AI products without knowing whether they are working.

Read in the book →

← Real-World Examples What Goes Wrong →