Reflection Questions

Map an AI product you have worked on or used to the five-layer stack. Which layers are most developed? Which are weakest? What would closing the weakest layer’s gaps require?
The chapter argues that 500ms is the practical ceiling for synchronous user-facing AI inference. Identify an AI product feature you would like to build. What is the inference latency of the model you would use? If it exceeds 500ms, what optimisation strategies would you apply?
Design a cost model for an AI product with one million monthly active users, each making an average of 20 AI requests per day. What are the inference costs at different pricing tiers? At what scale does inference cost dominate operating costs?
The chapter argues that AI output quality is probabilistic and requires statistical evaluation rather than case-by-case testing. What does an evaluation pipeline for a code completion product look like? What metrics would you use? How would you collect ground truth?