Three contrasting jobs illustrate how the same broad category — “writing assistant” — produces radically different system requirements.
The job “help me write faster” requires synchronous AI completions with a latency SLO in the range of 200–500 milliseconds. Above that threshold, users perceive a delay and the tool feels slow. This job demands inference infrastructure optimised for speed, small model sizes or aggressive caching, and streaming outputs so the user sees tokens appearing rather than waiting for the full completion.
The job “help me write better over time” is a different system entirely. Better-over-time requires feedback loops: the system must observe what the user accepted and rejected, store that history, use it to calibrate future suggestions, and surface patterns in the user’s writing that the user themselves may not have noticed. This job is primarily asynchronous. The user does not need the feedback in real time; they need it to be accurate and accumulative. The architecture is a data pipeline, not a latency-sensitive service.
The job “help me not lose my work” is durability-first. The system must acknowledge writes only after they are committed to durable storage. Eventual consistency is unacceptable. Sync-before-acknowledge is the design pattern. Latency is secondary to durability.
These three jobs look similar on the surface — they are all “writing assistant” features — but they produce architectures that share almost nothing. A technical leader who does not distinguish between them will build a single system that does all three badly.
The same logic applies to API design. An API that exposes what the system does — “create entity,” “update entity,” “delete entity” — forces the caller to understand the system’s internal model. An API that exposes what the user wants to accomplish — “publish article,” “retract article,” “request review” — lets the system evolve its internal model without breaking callers. The unit of the API surface is the job, not the implementation.