deep dive · claude code · part 3 — context

Every time you press enter in Claude Code, the harness assembles a complete document and ships it to the Anthropic API. The document is rebuilt from scratch each turn — but most of it is identical to the previous turn. Understanding what goes in it, and how it grows over the course of a session, is the foundation for the next two articles: why the prompt cache matters (Part 4), and why compaction is necessary (Part 5).

the context window question

"200K context window" is a shorthand that's slightly misleading. The number varies by model — 200K is the default, and 1M is available for Claude Sonnet 4.6 and Opus 4.6. More precisely, the context window is the input limit. Output is a separate budget: up to 64K tokens for most models, 128K for Opus. The two are independent — a full 200K input doesn't reduce how many output tokens you can receive.

The effective input budget is also smaller than the advertised window. Claude Code triggers compaction before the conversation fills the input window — it reserves 20K tokens of headroom for the compaction summary output, plus a 13K safety buffer. The autocompact threshold is:

what fills the envelope

An Anthropic API request has three top-level fields: system, tools, and messages. All three count against the input window. Click any segment to see what's actually inside.

an api request — three top-level fields. click one to drill in.

click a field on the left to see its sub-sections

static · intro & identity

static · # System (text output, permissions, hooks)

static · # Doing tasks (engineering guidance)

static · # Executing actions with care

static · # Using your tools

static · # Tone & style + # Output efficiency

— SYSTEM_PROMPT_DYNAMIC_BOUNDARY —

dynamic · # Environment (cwd, platform, model, cutoff)

dynamic · # auto memory (model-authored MEMORY.md)

dynamic · # MCP server instructions (per connected server)

git status (appended at the end via appendSystemContext)

~160 built-in tool schemas (Read, Bash, Edit, Grep, …)

connected MCP tool schemas (per-server, prefixed mcp__<server>__<tool>)

[0] synthetic user msg · <system-reminder> with CLAUDE.md + today's date

[1...N] real conversation · text + tool_use + tool_result blocks

click a field above to expand it · click a segment to see its contents

the evolving envelope

Most of the envelope doesn't change turn to turn. The system prompt is the same. The tool schemas are the same. What grows is the messages array — every turn adds at least one user/assistant pair, and tool calls add more.

Each tool call leaves two permanent blocks in the array. "What's in package.json?" produces three messages, two of which the user never sees:

Both new blocks stay in the messages array for the rest of the session. So does the assistant's text reply itself — every "let me look at this", every analysis paragraph, every summary at the end of a tool sequence. Claude's replies aren't free; they accumulate the same way tool results do. With extended thinking enabled, those reasoning blocks persist too. A typical assistant turn is a few hundred to a couple thousand tokens of text alongside any tool calls.

Parallel tool calls stack the same way — three simultaneous Greps add three tool_use blocks in one assistant message and three tool_result blocks in the next user message. Six new permanent blocks from one logical action.

Even with assistant text in the mix, in tool-heavy sessions the tool results still dominate. Below is a realistic 16-turn session (debugging + documentation), with every tool call's contribution shown as its own labeled chunk and the dialogue (user + assistant text + tool_use blocks) shown as the dim grey strips. Press play, step through, or drag the scrubber.

turn 0 / 16

22,000 / 200K tokens · 13% of 167K threshold

api envelope · 0 tokens

0 167K · autocompact 200K · window

hover any chunk for its label and token count

hover the bar to magnify the current turn's chunks · then click any one to inspect

main bar: system tool schemas msg[0] human-turn slot · zoom strip: user asst tool_use tool_result

—press play, step, or drag the scrubber to walk through the session

turn 0

below: swap between per-chunk detail and the running cost comparison · both update as you step

click any chunk in the bar to see what's in it

sonnet no $0.00 → cached $0.00 (—×)

opus no $0.00 → cached $0.00 (—×)

· with cache (reality) no cache (hypothetical)

pricing source · anthropic docs, retrieved 2026-04-24

rates below come from the anthropic pricing page (Claude Sonnet 4.6 and Claude Opus 4.6 rows, retrieved 2026-04-24). Input only — output tokens would add the same fixed amount in both scenarios, so they're excluded.

	sonnet 4.6	opus 4.6
base input	$3 / MTok	$5 / MTok
5-min cache write (1.25×)	$3.75 / MTok	$6.25 / MTok
cache read (0.10×)	$0.30 / MTok	$0.50 / MTok

per-turn math: no-cache call = envelope × base input · with-cache call = previous envelope × cache read + new delta × 5-min cache write. output tokens excluded.

calibration notes & sources of variance

baseline (~22K: system+skills+memory 9K, tool schemas 8K, msg[0] 5K) is calibrated against a live /context reading in a fresh Sonnet 4.6 session, no MCP servers connected. after 16 turns (12 of debugging a layout + auth bug, 4 more asking claude to survey the payment system and run tests) the envelope sits at ~113K (~68% of threshold). tool results still dominate per-turn growth — but notice turn 15, where an 8K model-authored summary is the entire contribution.

your numbers will vary. the shape of growth (baseline + slow dialog + spiky tool results) is stable; the magnitudes aren't. main sources of variance: CLAUDE.md sizecan swing msg[0] from 0.5K to 25K+ MCP serverscan balloon tool schemas from 8K to 50K+ file reads & bash outputscale with project size and what gets run memory + skillsdepend on auto-memory usage and which skills surface what stays constant across users: the static behavioral sections (~6K), the boundary marker, and the autocompact formula.

Toggle the cumulative cost view on the animation above. If every token were billed at the full input rate, cumulative cost would grow quadratic — envelope grows ~linearly each turn, and you pay for the whole prefix every turn. In practice the server charges 10% of the input rate for prefix bytes it's already seen. Dotted lines show the same session on Opus 4.6 (~1.67× Sonnet rates).

takeaway

Keep the context narrow. Tool results — file reads, bash output, grep matches — fill the envelope far more than prompts do, and they stay there until compaction. Pointed reads and targeted searches are cheaper per turn and buy runway before the 167K autocompact threshold trips.

Keep sessions short. The uncached cost curve is quadratic in turns, so a session that sprawls across two unrelated tasks is markedly more expensive than splitting them — even with caching, the savings gap widens as the envelope grows. Very long contexts also correlate with worse output — Chroma's Context Rot study found every one of 18 frontier models (Claude 4 included) degrades with input length, well before the window is full. A fresh session pays the ~22K baseline again, but the clean slate is usually worth it.

deep dive · claude code · part 3 — context

the context window question

what fills the envelope

the evolving envelope

takeaway

references