The Blueprint Was Never Secret
Last Monday morning, a developer named Philippe Chataignon pushed something to GitHub that stopped every AI engineering Discord we’re in mid-conversation: the complete, decompiled TypeScript source of Claude Code — Anthropic’s AI coding agent.
It wasn’t a hack. The npm package @anthropic-ai/claude-code had shipped with its TypeScript source maps intact — a build configuration oversight that left the full source readable to anyone who installed the package. Source maps are debug artifacts that map compiled JavaScript back to the original TypeScript. They’re routinely stripped before publishing production packages. This time, they weren’t.
Within 48 hours, the repository had thousands of stars. Developers were annotating the codebase, mapping the architecture, and filing issues faster than Anthropic could respond. And for anyone building AI products — especially AI PMs — the exposed architecture tells a story that changes how we think about what agents actually are.
Here’s why this matters: reverse-engineering a production AI agent normally requires months of black-box testing, guessing at architecture, and reading between the lines of documentation. This source map gave us a full X-ray of how the most widely-used AI coding agent actually works — not how Anthropic says it works, but how it’s built. 1,900 files. 512,000 lines of strict TypeScript. Every architectural decision visible.
We spent the weekend reading it. Here’s what we found.
It’s Not a Chatbot Wrapper. It’s an Operating System.
Open the decompiled source tree and the first thing that hits you: this isn’t “Claude in a terminal.” Claude Code is a full operating system for software work — with the same architectural patterns you’d find in a container runtime or an IDE platform.
Here’s what’s actually in there, with specifics:
Permission layers with granular tool-level access control. Not a single “allow/deny” toggle — a capability-based permission system where each tool (file read, file write, shell execution, web access) has its own grant. The agent operates deny-by-default.
Here’s what a per-tool grant looks like in practice:
// Simplified from Claude Code's permission model
interface ToolPermission {
tool: "file_read" | "file_write" | "shell_exec" | "web_fetch";
scope: string; // e.g., "src/**" or "/tmp/*"
granted: boolean;
grantedAt: number; // timestamp
grantedBy: string; // user or auto-grant rule
}
// When the agent wants to edit a file:
// 1. Check: does a grant exist for file_write + this path?
// 2. If no → prompt user: "Can I edit src/auth/handler.ts?"
// 3. If yes → execute + log to audit trail
// 4. Every invocation logged regardless of grant status
When Claude Code asks “Can I edit this file?” it’s not being polite — it’s hitting a permission check that logs every invocation to an audit trail. This is the pattern every enterprise we advise asks for, and most teams try to bolt on after launch. Claude Code baked it in from line one.
Session recovery through persistent state serialization. Mid-refactor and your laptop crashes? Claude Code picks up exactly where it left off. The agent maintains a state graph — not a conversation log, but a structured representation of: what tools are available, what files are open, what background tasks are running, what the current work objective is, and what step it’s on.
That state gets serialized to disk as JSON on every state change. On restart, it hydrates the state and resumes. We’ve evaluated 14 agent frameworks this year. Only 2 had any form of session recovery. Claude Code’s implementation is the most complete we’ve seen.
Background job orchestration. While you’re reviewing a suggested code change, Claude Code is pre-fetching related files, running type checks, and staging the next logical step. The source reveals a task scheduler that manages concurrent operations without blocking the main interaction loop. This is the “operating system” quality — the agent isn’t waiting for your next prompt. It’s working in the background on what you’ll likely need next.
MCP (Model Context Protocol) plumbing. MCP is Anthropic’s open standard for connecting AI agents to external tools and data sources. Think of it like USB-C for AI — a universal connector. Instead of writing custom integration code for every database, API, or internal tool, you expose them as MCP servers (a lightweight wrapper that speaks a standardized JSON protocol), and the agent connects through that protocol. Claude Code’s source shows deep MCP integration: it discovers available tools at startup, negotiates capabilities, and routes tool calls through MCP channels. Adding a new data source goes from “2-day custom integration” to “point the agent at an MCP server URL.”
Multi-agent coordination primitives. The codebase includes infrastructure for spawning sub-agents, passing context between them, and aggregating results. Not one model thinking harder — multiple specialized agents dividing work.
The Insight That Changes Everything: Memory as Index
Here’s the architectural pattern that has the most direct implications for every AI PM reading this.
Most teams we advise stuff everything into the system prompt: the codebase summary, the ticket, the style guide, the deployment docs, the testing requirements. We’ve measured system prompts at 15 companies. The median: 47,000 tokens. The worst: 195,000 tokens. The output at that context length? Incoherent. The cost? Up to $47 per run.
Claude Code takes a fundamentally different approach. Memory is an index, not storage.
The codebase uses a lightweight MEMORY.md file that doesn’t contain knowledge — it contains pointers to knowledge. When the agent needs context about, say, the authentication module, it doesn’t load the entire auth system. It reads the pointer, fetches the specific lines referenced, and works with surgical precision.
Here’s what the actual pattern looks like:
# MEMORY.md (pattern extracted from Claude Code)
# This file is an INDEX — it points to knowledge, doesn't store it.
# Target size: <2,000 tokens. Self-edits to stay lean.
## Project Structure
- Auth: src/auth/README.md:1-45 (OAuth2 flow + session management)
- API: src/api/ARCHITECTURE.md:1-89 (REST endpoints + WebSocket patterns)
- Tests: src/tests/STRATEGY.md:1-32 (unit/integration split rationale)
- Deploy: ops/RUNBOOK.md:1-28 (CI/CD pipeline + rollback procedures)
## Active Context
- Current task: Refactor payment module to support multi-currency
- Constraints: backward compat with v2 API, no downtime migration
- Key files: src/payments/processor.ts, src/api/v2/payments.ts
- Decision: using event sourcing (see Decisions Log, 2024-03-15)
## Decisions Log (newest first)
- 2024-03-15: Event sourcing for payment state (see ADR-017.md)
- 2024-03-10: Rejected GraphQL for v3 — REST + WebSocket sufficient
- 2024-03-01: Chose Stripe Connect over custom payment routing
When the agent needs auth context, it reads src/auth/README.md lines 1 through 45 — not the entire file, not the entire auth directory, not the entire codebase. That pointer syntax (file:lines) is the key innovation. The agent fetches exactly what it needs, when it needs it.
The memory system also self-edits. References to files that no longer exist get pruned. When the same information appears in two places, the memory consolidates to one pointer. The file stays under 2,000 tokens — small enough to always fit in the system prompt without competing for context space.
💰 THE MONEY SHOT: What This Architecture Actually Costs
We replicated Claude Code’s memory-as-index pattern on our own agent stack last month. Real numbers from the same task (scoping a feature with 12 dependencies across 4 services):
Approach Context Size Cost Per Run Output Quality (human eval) Everything in system prompt 195,000 tokens $47.20 3/10 — incoherent, contradictory Summarized context 45,000 tokens $10.80 6/10 — decent but missed edge cases Memory-as-index (pointer retrieval) 22,000 tokens $3.20 8/10 — caught 11 of 12 dependencies
Same model (Claude 3.5 Sonnet). Same task. The only variable: how we structured the context. The index approach found 11 of 12 cross-service dependencies because it retrieved the SPECIFIC files that mattered instead of drowning in 195K tokens of everything.
This is what “context engineering” means in practice. Not better prompts. Better architecture for what the model sees.
How the Pieces Work Together
Here’s the flow of a typical Claude Code interaction, showing how the memory index, permission layer, and state persistence interact:
User: "Refactor the payment module to support EUR"
1. MEMORY LOOKUP
Agent reads MEMORY.md → finds pointer:
src/payments/processor.ts (payment logic)
src/api/v2/payments.ts (API contract)
ADR-017.md (event sourcing decision)
2. PERMISSION CHECK
Agent needs: file_read (3 files) + file_write (2 files)
Checks grants → file_read: granted for src/**
file_write: not granted yet → prompts user
User grants → logged to audit trail
3. CONTEXT ASSEMBLY
Fetches processor.ts lines 1-89 (2,100 tokens)
Fetches payments.ts lines 1-45 (1,200 tokens)
Fetches ADR-017.md lines 1-28 (800 tokens)
+ System prompt (1,800 tokens)
+ Memory index (1,500 tokens)
+ Task description (400 tokens)
TOTAL: 7,800 tokens (not 195,000)
4. EXECUTION + STATE SAVE
Agent proposes changes → state serialized to disk
Background: type-checking runs on proposed changes
Background: related test files pre-fetched
5. CRASH RECOVERY (if needed)
On restart → hydrate state from disk
→ knows: which files open, which changes proposed,
which background tasks were running
→ resumes from step 4, not step 1
Every component serves the others. Memory makes context surgical. Permissions make it safe. State persistence makes it reliable. Background tasks make it fast. This is why “the harness is the product” — the model generates text, but the harness turns text generation into a reliable software engineering workflow.
The Context Engineering Checklist
We distilled the architectural patterns from Claude Code’s source into a checklist we now use for every agent project. Copy this into your team’s planning doc. Run through it before your next sprint.
┌─────────────────────────────────────────────────┐
│ CONTEXT ENGINEERING CHECKLIST │
│ (Copy-paste into your project doc) │
├─────────────────────────────────────────────────┤
│ │
│ MEMORY ARCHITECTURE │
│ □ Memory file is an INDEX with pointers to │
│ source files (file:line-range format) │
│ □ Max memory file size: 2,000 tokens │
│ □ Auto-prune: stale pointers removed after 14 │
│ days unused │
│ □ Dedup: same fact never in 2+ places — one │
│ pointer, one source of truth │
│ □ Retrieval: agent fetches specific line ranges, │
│ never loads full files into context │
│ │
│ CONTEXT BUDGET PER CALL │
│ □ System prompt (always loaded): <2,000 tokens │
│ □ Memory index: <2,000 tokens │
│ □ Current task context: <5,000 tokens │
│ □ Retrieved file chunks: <10,000 tokens │
│ □ Tool output: <5,000 tokens │
│ □ HARD CEILING: 25,000 tokens per call │
│ │
│ PERMISSION MODEL │
│ □ Deny-by-default (no tool access without grant)│
│ □ Per-tool permissions (read ≠ write ≠ execute) │
│ □ Scoped grants (e.g., write to src/** only) │
│ □ Every invocation logged with timestamp │
│ □ User can revoke mid-session │
│ │
│ STATE & RECOVERY │
│ □ Full session state serialized on every change │
│ □ State includes: open files, active tasks, │
│ permissions, work objective, current step │
│ □ On crash: resume from last state (not scratch)│
│ □ Background tasks tracked and resumable │
│ │
│ COST TRACKING │
│ □ Token count logged per call │
│ □ Cost per task tracked and alerted if >$5 │
│ □ Weekly cost report auto-generated │
│ │
│ SCORING │
│ 15+ checks = production-ready │
│ 10-14 = staging/beta only │
│ <10 = prototype (don't ship it) │
└─────────────────────────────────────────────────┘
Model vs. Harness: Where the Value Actually Lives
Built from our own agent deployments and what we found in Claude Code’s source. The “Most Teams” column reflects the median across 23 agent projects we reviewed in 2025.
Component What Most Teams Build What Claude Code Does Why It Matters Memory Everything in system prompt (50K–200K tokens) Index file (<2K tokens) + on-demand line-range retrieval 10–30x cost reduction. Better output because model focuses on relevant context only Permissions No system, or a single allow/deny flag Per-tool capability grants with scope limits + audit log No enterprise deploys an agent with full filesystem access. This is ship-or-don’t Error Recovery Retry 3x then show error Serialize full state to disk → crash → hydrate → resume Users lose zero work. Agent picks up mid-task, not from scratch Multi-step Sequential prompt chain (1→2→3) Parallel background tasks with dependency graph A 4-service refactor completes in 5 min instead of 20 Tool Integration Custom API wrapper per tool (2-day build each) MCP standard protocol (point at server URL, done) New tool integrations go from days to minutes Context One prompt, everything included Layered budget: system + memory + task + retrieved + tool Predictable quality. Same performance on simple and complex tasks
What to Do This Week
1. Count your tokens. Open your agent’s system prompt. Measure it. If it’s over 10,000 tokens, you’re overstuffing. We’ve seen teams cut prompt size by 80% and get better output by switching to pointer-based retrieval.
2. Build the permission layer before the next feature. Adding permissions after launch took one team we advised 3 months. Adding from the start took us 3 days. Even basic tool-level read/write/execute with logging transforms “demo” into “deployable.”
3. Implement a state snapshot. One JSON file, written on every state change, containing: current step, open files, active permissions, work objective. On crash, hydrate and resume. We implemented this pattern in 4 days. It’s the highest-ROI infrastructure investment you can make.
The blueprint was never secret. These patterns have been running in production at every serious AI lab for months. Now, thanks to one forgotten source map in an npm package, we can see them all in one codebase.
The question isn’t whether these patterns exist. It’s whether you’re building with them.
→ Take the AI PM Eval — 8 production scenarios that test context engineering, agent architecture, and production debugging. See if you’d catch the problems Claude Code’s team already solved. pmthebuilder.com/eval

