Prompt caching amortizes the cost of a long system prompt by reusing the model's work on the shared prefix across requests — you pay the full input price once, then a fraction for each follow-up. The prefix must match byte-for-byte, so whitespace, ordering, and dynamic variables need to live after the cached block. Providers typically keep caches warm for about five minutes, which fits bursty traffic well but means quiet workloads recompute every request.