The Computing Series

The Concept

The AI product stack has five layers, each with distinct requirements.

Data is the foundation. The quality, coverage, and freshness of training data determines the ceiling of what the model can do. No serving optimisation, prompt engineering, or fine-tuning can compensate for training data that is systematically biased, incomplete, or stale. Data management for AI products is an ongoing engineering concern, not a one-time collection exercise.

The model sits on top of the data. Model selection involves tradeoffs between quality, latency, cost, and the ability to fine-tune. A large general-purpose language model produces higher-quality outputs on diverse tasks but has higher inference latency and cost. A smaller task-specific model runs faster and cheaper but fails on inputs outside its training distribution.

Serving is the infrastructure that delivers model outputs to users with acceptable latency and availability. For most user-facing products, 500 milliseconds is the upper bound on acceptable response time for synchronous interactions. This constraint is severe: it excludes the largest and highest-quality models from synchronous user-facing deployment unless aggressive optimisation is applied.

The application layer is where product logic lives. Prompt construction, output parsing, context management, and fallback handling are application-layer concerns. The application layer is where the product’s behaviour is defined; changes to it can dramatically change the user experience without changing the model.

Evaluation is the layer that is most commonly missing in early AI products and most consequential in mature ones. Without evaluation infrastructure, the team cannot measure whether the product is working, cannot detect when it starts working worse, and cannot make principled decisions about model upgrades or prompt changes.


Read in the book →