Prompt caching lets the model reuse its work on a repeated prefix of your request — so if you're sending the same long system prompt on every call, you only pay to process it once. The next requests that share that prefix come back faster and cheaper, usually at a fraction of the normal input price.