From Prompt to Response: A Step-by-Step Walkthrough of LLM Inference
Update: For a deeper systems-level treatment of LLM inference, especially the interaction between request scheduling, prefill, decode, and KV-cache reuse, see arXiv:2606.24937.
Read article