🏛️ Claude Certified Architect: The Complete Study Guide

Anthropic's first technical certification — 60 questions, 5 domains, a 720/1000 pass mark, and a $100M partner network behind it. Everything you need to pass.

March 16, 2026 · 14 min read

📺 Watch the video version:

🎧

Listen to this article

What Is the Claude Certified Architect?

Anthropic just launched its first-ever technical certification — and backed it with one hundred million dollars. The Claude Certified Architect (CCA) — Foundations exam is a 60-question, proctored test designed for solution architects who build production AI applications with Claude.

This isn't a conceptual badge you earn by watching videos. The CCA validates practical judgment: knowing when to use programmatic enforcement vs. prompt guidance, how agentic loops should handle stop_reason, how to write MCP tool descriptions that actually work, and how to manage context in multi-agent systems that don't fall apart under load.

The target candidate has 6+ months of hands-on experience building with the Claude API, Agent SDK, Claude Code, and MCP. They've built real agentic systems and made real architectural tradeoffs. The exam tests whether they made the right ones — or whether they just got lucky.

📋 Exam At a Glance 60 multiple-choice questions · 4 scenarios (drawn from 6) · 720/1000 minimum passing score · Closed-book, proctored · No penalty for guessing

The $100M Partner Network Behind It

The CCA certification launched alongside Anthropic's Claude Partner Network — a $100 million commitment for 2026 targeting enterprise system integrators. This is not a grassroots community push. It's a deliberate credential-as-moat strategy.

The anchor partners are telling: Accenture is training 30,000 professionals, Cognizant is giving 350,000 associates access, Deloitte and Infosys round out the founding cohort. The first 5,000 partner employees get access free.

The bet is straightforward: enterprise buyers want certified practitioners. If Anthropic controls the credential, they influence who gets hired for Claude deployments — which drives more Claude deployments. AWS did this with its certifications in the early cloud era. Anthropic is running the same playbook.

For individual practitioners, the timing matters. Credentials have the most value when supply is low. The CCA cohort is small right now. That changes fast.

Exam Format & Scoring

The exam uses scenario-based multiple choice: each question has one correct answer and three distractors. Distractors are specifically designed for candidates who know the concepts but lack practical judgment — partial knowledge gets you the wrong answer.

Domain	Weight	Questions (~)
Domain 1: Agentic Architecture & Orchestration	27%	~16
Domain 2: Tool Design & MCP Integration	18%	~11
Domain 3: Claude Code Configuration & Workflows	20%	~12
Domain 4: Prompt Engineering & Structured Output	20%	~12
Domain 5: Context Management & Reliability	15%	~9

Scores are reported on a 100–1,000 scaled score. The minimum passing score is 720. Scaling accounts for slight variation in difficulty across exam forms — a 720 on a harder form is equivalent to a 720 on an easier one.

The 6 Exam Scenarios

Every question is grounded in one of six realistic production scenarios. On your exam, 4 of these 6 will appear, selected at random. You don't know which four in advance.

Scenario 1 — Customer Support Resolution Agent: Claude Agent SDK with MCP tools (get_customer, lookup_order, process_refund, escalate_to_human). Target: 80%+ first-contact resolution. Tests agentic loops, tool design, escalation patterns.
Scenario 2 — Code Generation with Claude Code: Team workflow with CLAUDE.md configs, slash commands, plan mode vs. direct execution. Tests Claude Code configuration and context management.
Scenario 3 — Multi-Agent Research System: Coordinator + web search, document analysis, synthesis, and report generation subagents. Tests orchestration, context passing, error propagation.
Scenario 4 — Developer Productivity with Claude: Codebase exploration and boilerplate generation using Agent SDK with built-in tools (Read, Write, Bash, Grep, Glob) and MCP servers. Tests tool distribution and Claude Code integration.
Scenario 5 — Claude Code for CI/CD: Automated code review, test generation, PR feedback in a CI pipeline. Tests non-interactive mode, structured output, prompt engineering for false-positive reduction.
Scenario 6 — Structured Data Extraction: Unstructured document → validated JSON output. Tests schema design, tool_use, validation-retry loops, batch processing, human review routing.

⚠️ Strategy Note You can't predict which 4 scenarios appear. Study all 6. Domains 1 (Agentic) and 2 (Tool Design) appear in at least 4 of the 6 scenarios — highest ROI for study time.

Domain 1: Agentic Architecture & Orchestration (27%)

The heaviest domain. Seven task statements covering how you design and implement systems where Claude acts autonomously over multiple steps.

What the Exam Actually Tests

Agentic loop control flow: The loop continues when stop_reason == "tool_use" and terminates when stop_reason == "end_turn". Anti-patterns: parsing natural language to detect completion, arbitrary iteration caps as the primary stop mechanism, checking for assistant text content as a completion indicator.
Multi-agent hub-and-spoke: A coordinator manages all inter-agent communication. Subagents do NOT inherit coordinator context automatically — context must be explicitly passed. Parallel subagents are spawned by emitting multiple Task tool calls in a single response.
The Task tool requirement: For a coordinator to spawn subagents, its allowedTools must include "Task". This is frequently tested.
Programmatic enforcement vs. prompt guidance: When a specific tool sequence is required for critical business logic (e.g., verifying identity before processing a refund), programmatic prerequisites provide deterministic guarantees. Prompt-based approaches are probabilistic — insufficient for financial or safety-critical operations.
SDK hooks (PostToolUse): Used to intercept and normalize tool results before the model processes them (e.g., converting Unix timestamps and ISO 8601 to a uniform format) or to block policy-violating actions (e.g., refunds over $500) and redirect to escalation.
Session management: --resume <session-name> continues a named session. fork_session creates independent branches from a shared analysis baseline to explore divergent approaches.

💡 Core Principle of Domain 1 Prompt instructions have a non-zero failure rate. Any compliance requirement with financial, legal, or safety consequences must be enforced programmatically — not via prompt.

Domain 2: Tool Design & MCP Integration (18%)

How you design tools that Claude actually uses correctly — and how you configure MCP servers.

What the Exam Actually Tests

Tool descriptions are the selection mechanism: When two tools have similar names and minimal descriptions, Claude misroutes. The fix is always richer descriptions first — include input formats, example queries, edge cases, and explicit "when to use this vs. that" guidance.
Too many tools degrades reliability: 18 tools vs. 4–5 relevant tools. More tools = more decision complexity = more misuse. Restrict each subagent's tool set to what its role requires.
Structured error responses: MCP tools should return isError: true with an errorCategory (transient/validation/permission/business), an isRetryable boolean, and a human-readable description. Generic "Operation failed" responses prevent intelligent recovery.
tool_choice options: "auto" lets Claude decide, "any" guarantees a tool is called (no conversational text), forced selection ({"type": "tool", "name": "..."}) ensures a specific tool is called first.
MCP server scoping: Project-level tooling goes in .mcp.json (version-controlled, shared). Personal/experimental servers go in ~/.claude.json. Environment variable expansion (${GITHUB_TOKEN}) keeps credentials out of the repo. Both scopes are available simultaneously.

Domain 3: Claude Code Configuration & Workflows (20%)

How you configure Claude Code for team development — from CLAUDE.md hierarchies to CI/CD integration.

What the Exam Actually Tests

CLAUDE.md hierarchy: User-level → project-level → directory-level. Instructions cascade and can be overridden. Use @import patterns to compose configs across files.
.claude/rules/ with glob patterns: YAML frontmatter specifying paths: ["src/api/**/*"] or paths: ["**/*.test.*"]. Rules load automatically when matching files are edited — deterministic, not inference-based. This is the right answer for path-specific conventions.
Custom slash commands: Project-scoped commands go in .claude/commands/ (version-controlled, available to all devs on clone). Personal commands go in ~/.claude/commands/.
Skills in .claude/skills/: SKILL.md frontmatter supports context: fork (runs in isolation), allowed-tools (restrict what the skill can use), and argument-hint.
Plan mode vs. direct execution: Plan mode for: complex multi-file changes, architectural decisions, multiple valid approaches, monolith migrations. Direct execution for: single-file bug fixes, clear well-scoped tasks. The exam tests when NOT to use plan mode as much as when to use it.
CI/CD non-interactive mode: The -p / --print flag makes Claude Code non-interactive (outputs to stdout, exits). --output-format json and --json-schema for structured CI output. CLAUDE_HEADLESS=true and --batch don't exist — common distractors.

Domain 4: Prompt Engineering & Structured Output (20%)

How you reliably extract structured data and design prompts that produce consistent, accurate results.

What the Exam Actually Tests

tool_use for structured output: Defining a JSON schema as a tool and forcing Claude to "call" it is the most reliable way to get structured output. Combine with tool_choice: "any" to guarantee the tool is called.
Schema design: Required vs. optional fields, nullable fields (prevents hallucination for absent info), enum with "other" + detail string patterns, strict mode for syntax error elimination.
Validation-retry loops: When Pydantic or JSON schema validation fails, send a follow-up request with the document, the failed extraction, and the specific validation error. Track which errors are retryable (format mismatches) vs. not (information genuinely absent from source).
Few-shot prompting: Targeted examples for ambiguous scenarios. Most effective at reducing false positives (e.g., in code review) and improving handling of structural variety in documents. Few-shot beats vague instructions; it does NOT beat tool descriptions for tool selection issues.
Message Batches API: 50% cost savings, up to 24-hour processing window, no guaranteed latency SLA. Use custom_id fields for request/response correlation. Does NOT support multi-turn tool calling within a single request. Right for: overnight reports, weekly audits, nightly test generation. Wrong for: blocking pre-merge checks, anything with latency requirements.
Multi-pass and multi-instance reviews: A model reviewing its own output retains reasoning context and is less likely to catch its own errors. An independent second Claude instance (no prior context) is more effective. For large PRs: per-file local analysis passes + separate cross-file integration pass beats single-pass on all files.

Domain 5: Context Management & Reliability (15%)

How you keep long agentic sessions accurate, coherent, and recoverable when things go wrong.

What the Exam Actually Tests

"Lost in the middle" effect: Models reliably process information at the beginning and end of long inputs. Key facts and summaries belong at the top. Detailed results with section headers follow.
Trimming verbose tool outputs: A 40-field order lookup returning all 40 fields to context is wasteful and degrades attention. Extract only relevant fields before they accumulate.
Progressive summarization risks: Condensing numerical values, percentages, dates, and customer-stated expectations into vague summaries destroys precision. Use a persistent "case facts" block with raw transactional data instead.
Scratchpad files: For long codebase exploration sessions, agents should maintain scratchpad files recording key findings. Reference them for subsequent questions to counteract context degradation. Use /compact to reduce context usage when it fills with verbose output.
Escalation triggers: Escalate when (1) the customer explicitly requests a human, (2) policy is ambiguous or silent on the specific request, or (3) the agent cannot make meaningful progress. Do NOT use: sentiment scores, self-reported confidence thresholds, or case complexity proxies. These are unreliable.
Structured error propagation: Subagents should return structured error context: failure type, attempted query, partial results, alternative approaches. The coordinator decides recovery. Anti-patterns: returning empty results as success (suppresses the error), terminating the full workflow on a single subagent failure.
Human review routing: Use field-level confidence scores calibrated against labeled validation sets. Stratified random sampling for ongoing error rate measurement. Validate accuracy by document type and field segment — aggregate accuracy metrics (e.g., "97% overall") can mask poor performance on specific sub-segments.

Sample Questions

These 12 questions are drawn from the official practice exam. Each one is a real indicator of exam difficulty and format.

Scenario: Customer Support Resolution Agent

Question 1

Production data shows that in 12% of cases, your agent skips get_customer entirely and calls lookup_order using only the customer's stated name, occasionally leading to misidentified accounts and incorrect refunds. What change would most effectively address this reliability issue?

A) Add a programmatic prerequisite that blocks lookup_order and process_refund calls until get_customer has returned a verified customer ID.
B) Enhance the system prompt to state that customer verification via get_customer is mandatory before any order operations.
C) Add few-shot examples showing the agent always calling get_customer first, even when customers volunteer order details.
D) Implement a routing classifier that analyzes each request and enables only the subset of tools appropriate for that request type.
✓ Correct Answer: A

When a specific tool sequence is required for critical business logic, programmatic enforcement provides deterministic guarantees that prompt-based approaches cannot. Options B and C rely on probabilistic LLM compliance — insufficient when errors have financial consequences. Option D addresses tool availability, not tool ordering.

Question 2

Production logs show the agent frequently calls get_customer when users ask about orders (e.g., "check my order #12345"), instead of calling lookup_order. Both tools have minimal descriptions ("Retrieves customer information" / "Retrieves order details") and accept similar identifier formats. What's the most effective first step to improve tool selection reliability?

A) Add few-shot examples to the system prompt demonstrating correct tool selection patterns, with 5–8 examples showing order-related queries routing to lookup_order.
✓ B) Expand each tool's description to include input formats it handles, example queries, edge cases, and boundaries explaining when to use it versus similar tools.
C) Implement a routing layer that parses user input before each turn and pre-selects the appropriate tool based on detected keywords and identifier patterns.
D) Consolidate both tools into a single lookup_entity tool that accepts any identifier and internally determines which backend to query.

Tool descriptions are the primary mechanism LLMs use for tool selection. When descriptions are minimal, models lack context to differentiate similar tools. Option B directly addresses the root cause. Few-shot examples (A) add token overhead without fixing the underlying issue. A routing layer (C) is over-engineered. Consolidating tools (D) is a valid architectural choice but requires more effort than a "first step" warrants.

Question 3

Your agent achieves 55% first-contact resolution, well below the 80% target. Logs show it escalates straightforward cases (standard damage replacements with photo evidence) while attempting to autonomously handle complex situations requiring policy exceptions. What's the most effective way to improve escalation calibration?

✓ A) Add explicit escalation criteria to your system prompt with few-shot examples demonstrating when to escalate versus resolve autonomously.
B) Have the agent self-report a confidence score (1–10) before each response and automatically route requests to humans when confidence falls below a threshold.
C) Deploy a separate classifier model trained on historical tickets to predict which requests need escalation before the main agent begins processing.
D) Use sentiment analysis to detect customer frustration levels and automatically escalate when negative sentiment exceeds a threshold.

Explicit criteria with few-shot examples directly addresses unclear decision boundaries. Option B fails because LLM self-reported confidence is poorly calibrated. Option C is over-engineered — prompt optimization hasn't been tried yet. Option D solves a different problem entirely; sentiment doesn't correlate with case complexity.

Scenario: Code Generation with Claude Code

Question 4

You want to create a custom /review slash command that runs your team's standard code review checklist. This command should be available to every developer when they clone or pull the repository. Where should you create this command file?

✓ A) In the .claude/commands/ directory in the project repository
B) In ~/.claude/commands/ in each developer's home directory
C) In the CLAUDE.md file at the project root
D) In a .claude/config.json file with a commands array

Project-scoped custom slash commands are stored in .claude/commands/ within the repository. They are version-controlled and automatically available to all developers on clone/pull. ~/.claude/commands/ is for personal commands. CLAUDE.md is for project instructions, not command definitions. Option D describes a configuration mechanism that doesn't exist.

Question 5

You've been assigned to restructure the team's monolithic application into microservices. This will involve changes across dozens of files and requires decisions about service boundaries and module dependencies. Which approach should you take?

✓ A) Enter plan mode to explore the codebase, understand dependencies, and design an implementation approach before making changes.
B) Start with direct execution and make changes incrementally, letting the implementation reveal the natural service boundaries.
C) Use direct execution with comprehensive upfront instructions detailing exactly how each service should be structured.
D) Begin in direct execution mode and only switch to plan mode if you encounter unexpected complexity during implementation.

Plan mode is designed for complex tasks involving large-scale changes, multiple valid approaches, and architectural decisions. Option B risks costly rework when dependencies are discovered late. Option C assumes you already know the right structure without exploration. Option D ignores that the complexity is already stated in the requirements.

Question 6

Your codebase has distinct areas with different coding conventions: React components (functional with hooks), API handlers (async/await with specific error handling), and database models (repository pattern). Test files are spread throughout alongside the code they test (e.g., Button.test.tsx next to Button.tsx), and you want all tests to follow the same conventions regardless of location. What's the most maintainable approach?

✓ A) Create rule files in .claude/rules/ with YAML frontmatter specifying glob patterns to conditionally apply conventions based on file paths.
B) Consolidate all conventions in the root CLAUDE.md file under headers for each area, relying on Claude to infer which section applies.
C) Create skills in .claude/skills/ for each code type that include the relevant conventions in their SKILL.md files.
D) Place a separate CLAUDE.md file in each subdirectory containing that area's specific conventions.

.claude/rules/ with glob patterns (e.g., **/*.test.tsx) allows conventions to be automatically applied based on file paths regardless of directory location. Option B relies on inference, making it unreliable. Option C requires manual skill invocation. Option D can't handle test files spread across many directories since CLAUDE.md files are directory-bound.

Scenario: Multi-Agent Research System

Question 7

After running the system on "impact of AI on creative industries," you observe all subagents complete successfully, but the final reports cover only visual arts, completely missing music, writing, and film. The coordinator's logs show it decomposed the topic into three subtasks: "AI in digital art creation," "AI in graphic design," and "AI in photography." What is the most likely root cause?

A) The synthesis agent lacks instructions for identifying coverage gaps in the findings it receives from other agents.
✓ B) The coordinator agent's task decomposition is too narrow, resulting in subagent assignments that don't cover all relevant domains of the topic.
C) The web search agent's queries are not comprehensive enough and need to be expanded to cover more creative industry sectors.
D) The document analysis agent is filtering out sources related to non-visual creative industries due to overly restrictive relevance criteria.

The coordinator's logs reveal the root cause directly: it decomposed "creative industries" into only visual arts subtasks. The subagents executed their assigned tasks correctly — the problem is what they were assigned. Options A, C, and D incorrectly blame downstream agents that are working correctly within their assigned scope.

Question 8

The web search subagent times out while researching a complex topic. You need to design how this failure information flows back to the coordinator agent. Which error propagation approach best enables intelligent recovery?

✓ A) Return structured error context to the coordinator including the failure type, the attempted query, any partial results, and potential alternative approaches.
B) Implement automatic retry logic with exponential backoff within the subagent, returning a generic "search unavailable" status only after all retries are exhausted.
C) Catch the timeout within the subagent and return an empty result set marked as successful.
D) Propagate the timeout exception directly to a top-level handler that terminates the entire research workflow.

Structured error context gives the coordinator the information it needs to make intelligent recovery decisions. Option B's generic status hides valuable context. Option C suppresses the error entirely, preventing any recovery and risking incomplete research. Option D terminates the entire workflow unnecessarily when recovery strategies could succeed.

Question 9

During testing, you observe the synthesis agent frequently needs to verify specific claims. Currently this adds 2–3 round trips and increases latency by 40%. Evaluation shows 85% of verifications are simple fact-checks (dates, names, statistics) while 15% require deeper investigation. What's the most effective approach to reduce overhead while maintaining reliability?

✓ A) Give the synthesis agent a scoped verify_fact tool for simple lookups, while complex verifications continue delegating to the web search agent through the coordinator.
B) Have the synthesis agent accumulate all verification needs and return them as a batch to the coordinator at the end of its pass.
C) Give the synthesis agent access to all web search tools so it can handle any verification independently.
D) Pre-cache likely verification targets before synthesis begins, based on predicted claim types from the research queries.

Option A applies least-privilege: the synthesis agent gets only what it needs for the 85% common case while preserving the existing coordination pattern for complex cases. Option B's batching creates blocking dependencies since synthesis steps may depend on earlier verified facts. Option C over-provisions the synthesis agent. Option D relies on speculative caching that can't reliably predict verification needs.

Scenario: Claude Code for Continuous Integration

Question 10

Your pipeline script runs claude "Analyze this pull request for security issues" but the job hangs indefinitely. Logs indicate Claude Code is waiting for interactive input. What's the correct approach to run Claude Code in an automated pipeline?

✓ A) Add the -p flag: claude -p "Analyze this pull request for security issues"
B) Set the environment variable CLAUDE_HEADLESS=true before running the command
C) Redirect stdin from /dev/null: claude "Analyze this pull request for security issues" < /dev/null
D) Add the --batch flag: claude --batch "Analyze this pull request for security issues"

The -p (or --print) flag is the documented way to run Claude Code in non-interactive mode: processes the prompt, outputs to stdout, and exits. Options B and D reference non-existent features. Option C uses a Unix workaround that doesn't properly address Claude Code's command syntax.

Question 11

Your team wants to reduce API costs. Two workflows currently use real-time Claude calls: (1) a blocking pre-merge check that must complete before developers can merge, and (2) a technical debt report generated overnight for review the next morning. Your manager proposes switching both to the Message Batches API for its 50% cost savings. How should you evaluate this proposal?

✓ A) Use batch processing for the technical debt reports only; keep real-time calls for pre-merge checks.
B) Switch both workflows to batch processing with status polling to check for completion.
C) Keep real-time calls for both workflows to avoid batch result ordering issues.
D) Switch both to batch processing with a timeout fallback to real-time if batches take too long.

The Batches API offers 50% savings but has up to 24-hour processing times with no guaranteed latency SLA — unsuitable for blocking pre-merge checks, ideal for overnight jobs. Option B relies on "often faster" completion, which isn't acceptable for blocking workflows. Option C reflects a misconception — batch results can be correlated using custom_id. Option D adds unnecessary complexity.

Question 12

A pull request modifies 14 files across the stock tracking module. Your single-pass review produces inconsistent results: detailed feedback for some files but superficial comments for others, obvious bugs missed, and contradictory feedback — flagging a pattern as problematic in one file while approving identical code elsewhere. How should you restructure the review?

✓ A) Split into focused passes: analyze each file individually for local issues, then run a separate integration-focused pass examining cross-file data flow.
B) Require developers to split large PRs into smaller submissions of 3–4 files before automated review runs.
C) Switch to a higher-tier model with a larger context window to give all 14 files adequate attention in one pass.
D) Run three independent review passes on the full PR and only flag issues that appear in at least two of the three runs.

Splitting reviews into focused passes directly addresses attention dilution when processing many files at once. Option B shifts burden to developers without improving the system. Option C misunderstands that larger context windows don't solve attention quality issues. Option D would suppress detection of real bugs by requiring consensus on issues that may only be caught intermittently.

What Will NOT Be On the Exam

The official guide explicitly excludes the following — don't waste study time here:

Fine-tuning Claude models or training custom models
Claude API authentication, billing, or account management
Deploying or hosting MCP servers (infrastructure, networking, containers)
Claude's internal architecture, training process, or model weights
Constitutional AI, RLHF, or safety training methodologies
Embedding models or vector database implementation details
Computer use (browser automation, desktop interaction)
Vision/image analysis capabilities
Streaming API implementation or server-sent events
Rate limiting, quotas, or API pricing calculations
OAuth, API key rotation, or authentication protocol details
Specific cloud provider configurations (AWS, GCP, Azure)
Performance benchmarking or model comparison metrics
Prompt caching details (beyond knowing it exists)
Token counting algorithms or tokenization specifics

The 7-Step Preparation Plan

From the official exam guide — these are the Anthropic-recommended activities, not speculation:

Build a complete agentic loop with the Claude Agent SDK. Implement tool calling, error handling, and session management. Practice spawning subagents and passing context between them explicitly.
Configure Claude Code for a real project. Set up a CLAUDE.md hierarchy, create path-specific rules in .claude/rules/, build a custom skill with context: fork, and integrate at least one MCP server.
Design and test MCP tools. Write descriptions that clearly differentiate similar tools. Implement structured error responses with error categories and retryable flags. Test tool selection reliability with ambiguous requests.
Build a structured data extraction pipeline. Use tool_use with JSON schemas, implement validation-retry loops, design schemas with optional/nullable fields, and practice batch processing with the Message Batches API.
Practice prompt engineering techniques. Write few-shot examples for ambiguous scenarios. Define explicit review criteria to reduce false positives. Design multi-pass review architectures for large code reviews.
Study context management patterns. Practice extracting structured facts from verbose tool outputs, implementing scratchpad files for long sessions, and designing subagent delegation to manage context limits.
Review escalation and human-in-the-loop patterns. Understand when to escalate (policy gaps, customer requests, inability to progress) versus resolve autonomously. Practice designing human review workflows with confidence-based routing.

✅ Access the Exam Register at anthropic.skilljar.com. First 5,000 partner employees get free access. Practice exam link is provided separately after registration.

References

Anthropic — Claude Certified Architect Foundations Exam Guide (v0.1) — Official exam guide covering all 5 domains, 30 task statements, 12 sample questions, and the full content outline. Source for all domain weights, task statements, and sample questions in this article. ↗ link
Anthropic — Claude Partner Network Announcement — Details on the $100M partner network, anchor partners (Accenture, Cognizant, Deloitte, Infosys), and access terms for the first 5,000 partner employees. ↗ link
Anthropic — Claude Agent SDK Documentation — Reference for agentic loops, subagent spawning via the Task tool, PostToolUse hooks, and session management patterns. ↗ link
Anthropic — Model Context Protocol (MCP) Specification — Official spec for MCP server configuration, tool and resource interfaces, error handling, and .mcp.json structure. ↗ link
Anthropic — Claude Code Documentation — Reference for CLAUDE.md hierarchy, .claude/rules/ glob patterns, custom slash commands, skills frontmatter, plan mode vs. direct execution, and CI/CD integration (-p flag). ↗ link
Anthropic — Message Batches API — Documentation on the 50% cost savings, 24-hour processing window, custom_id correlation, and limitations (no multi-turn tool calling support). ↗ link