Every time you press enter in Claude Code, the harness assembles a complete document and ships it to the Anthropic API. The document is rebuilt from scratch each turn — but most of it is identical to the previous turn. Understanding what goes in it, and how it grows over the course of a session, is the foundation for the next two articles: why the prompt cache matters (Part 4), and why compaction is necessary (Part 5).
"200K context window" is a shorthand that's slightly misleading. The number varies by model — 200K is the default, and 1M is available for Claude Sonnet 4.6 and Opus 4.6. More precisely, the context window is the input limit. Output is a separate budget: up to 64K tokens for most models, 128K for Opus. The two are independent — a full 200K input doesn't reduce how many output tokens you can receive.
The effective input budget is also smaller than the advertised window. Claude Code triggers compaction before the conversation fills the input window — it reserves 20K tokens of headroom for the compaction summary output, plus a 13K safety buffer. The autocompact threshold is:
An Anthropic API request has three top-level fields: system,
tools, and
messages. All three count against the input window. Click any segment to see what's actually inside.
Most of the envelope doesn't change turn to turn. The system prompt is the same. The tool schemas are the same. What grows is the messages array — every turn adds at least one user/assistant pair, and tool calls add more.
Each tool call leaves two permanent blocks in the array. "What's in package.json?" produces three messages, two of which the user never sees:
// user turn
{ role: "user",
content: "what's in package.json?" }
// assistant turn — text + tool_use
{ role: "assistant", content: [
{ type: "text",
text: "let me read it" },
{ type: "tool_use",
name: "Read", input: {...} }
] }
// user turn — tool_result
{ role: "user", content: [
{ type: "tool_result",
content: "...file contents..." }
] }
Both new blocks stay in the messages array for the rest of the session. So does the assistant's text reply itself — every "let me look at this", every analysis paragraph, every summary at the end of a tool sequence. Claude's replies aren't free; they accumulate the same way tool results do. With extended thinking enabled, those reasoning blocks persist too. A typical assistant turn is a few hundred to a couple thousand tokens of text alongside any tool calls.
Parallel tool calls stack the same way — three simultaneous Greps add three
tool_use
blocks in one assistant message and three
tool_result
blocks in the next user message. Six new permanent blocks from one logical action.
Even with assistant text in the mix, in tool-heavy sessions the tool results still dominate. Below is a realistic 16-turn session (debugging + documentation), with every tool call's contribution shown as its own labeled chunk and the dialogue (user + assistant text + tool_use blocks) shown as the dim grey strips. Press play, step through, or drag the scrubber.
rates below come from the anthropic pricing page (Claude Sonnet 4.6 and Claude Opus 4.6 rows, retrieved 2026-04-24). Input only — output tokens would add the same fixed amount in both scenarios, so they're excluded.
| sonnet 4.6 | opus 4.6 | |
|---|---|---|
| base input | $3 / MTok | $5 / MTok |
| 5-min cache write (1.25×) | $3.75 / MTok | $6.25 / MTok |
| cache read (0.10×) | $0.30 / MTok | $0.50 / MTok |
per-turn math: no-cache call = envelope × base input · with-cache call = previous envelope × cache read + new delta × 5-min cache write. output tokens excluded.
baseline (~22K: system+skills+memory 9K, tool schemas 8K, msg[0] 5K) is calibrated against a live /context reading in a fresh Sonnet 4.6 session, no MCP servers connected. after 16 turns (12 of debugging a layout + auth bug, 4 more asking claude to survey the payment system and run tests) the envelope sits at ~113K (~68% of threshold). tool results still dominate per-turn growth — but notice turn 15, where an 8K model-authored summary is the entire contribution.
Toggle the cumulative cost view on the animation above. If every token were billed at the full input rate, cumulative cost would grow quadratic — envelope grows ~linearly each turn, and you pay for the whole prefix every turn. In practice the server charges 10% of the input rate for prefix bytes it's already seen. Dotted lines show the same session on Opus 4.6 (~1.67× Sonnet rates).
Two rules of thumb for using Claude Code:
Keep the context narrow. Tool results — file reads, bash output, grep matches — fill the envelope far more than prompts do, and they stay there until compaction. Pointed reads and targeted searches are cheaper per turn and buy runway before the 167K autocompact threshold trips.
Keep sessions short. The uncached cost curve is quadratic in turns, so a session that sprawls across two unrelated tasks is markedly more expensive than splitting them — even with caching, the savings gap widens as the envelope grows. Very long contexts also correlate with worse output — Chroma's Context Rot study found every one of 18 frontier models (Claude 4 included) degrades with input length, well before the window is full. A fresh session pays the ~22K baseline again, but the clean slate is usually worth it.
The next articles dig into how caching and compaction actually work.