Cost does not come only from what you type into chat. It adds up across many layers:
- Always-on instructions —
AGENTS.md, custom instructions, hooks - Selected and open files in context
- Chat history and summaries between turns
- MCP tool definitions and their JSON schemas — even the ones you do not use
- Tool call results replayed as input in the next step
- Model output (output tokens are the most expensive)
- Retries, subagents, loops in agent mode
Three token types — why they matter:
- Input — everything you send to the model for the first time (prompt, context, tool results)
- Cached input — a repeated prefix the model has already seen in a previous turn of the same session. ~10× cheaper than fresh input.
- Output — what the model generates. ~6× more expensive than fresh input, ~60× more expensive than cached input.
Example from current real pricing (USD per 1M tokens):
| Model | Short cache | Short input | Short output | Long cache | Long input | Long output |
|---|---|---|---|---|---|---|
| gpt-5.5 | $0.50 | $5.00 | $30.00 | $1.00 | $10.00 | $45.00 |
| gpt-5.4 | $0.25 | $2.50 | $15.00 | $0.50 | $5.00 | $22.50 |
| gpt-5.4-mini | $0.075 | $0.75 | $4.50 | — | — | — |
| gpt-5.4-nano | $0.02 | $0.20 | $1.25 | — | — | — |
Mini and nano do not have long context.
Ballpark rules:
- Cache : Input : Output ≈ 1 : 10 : 60
- Each tier down is ~3–4× cheaper
- Long context makes input/cache ~2× more expensive and output ~1.5× more expensive
- Cache is why
/compactis not free