What Is the Claude Certified Architect?
Anthropic just launched its first-ever technical certification โ and backed it with one hundred million dollars. The Claude Certified Architect (CCA) โ Foundations exam is a 60-question, proctored test designed for solution architects who build production AI applications with Claude.
This isn't a conceptual badge you earn by watching videos. The CCA validates practical judgment: knowing when to use programmatic enforcement vs. prompt guidance, how agentic loops should handle stop_reason, how to write MCP tool descriptions that actually work, and how to manage context in multi-agent systems that don't fall apart under load.
The target candidate has 6+ months of hands-on experience building with the Claude API, Agent SDK, Claude Code, and MCP. They've built real agentic systems and made real architectural tradeoffs. The exam tests whether they made the right ones โ or whether they just got lucky.
The $100M Partner Network Behind It
The CCA certification launched alongside Anthropic's Claude Partner Network โ a $100 million commitment for 2026 targeting enterprise system integrators. This is not a grassroots community push. It's a deliberate credential-as-moat strategy.
The anchor partners are telling: Accenture is training 30,000 professionals, Cognizant is giving 350,000 associates access, Deloitte and Infosys round out the founding cohort. The first 5,000 partner employees get access free.
The bet is straightforward: enterprise buyers want certified practitioners. If Anthropic controls the credential, they influence who gets hired for Claude deployments โ which drives more Claude deployments. AWS did this with its certifications in the early cloud era. Anthropic is running the same playbook.
For individual practitioners, the timing matters. Credentials have the most value when supply is low. The CCA cohort is small right now. That changes fast.
Exam Format & Scoring
The exam uses scenario-based multiple choice: each question has one correct answer and three distractors. Distractors are specifically designed for candidates who know the concepts but lack practical judgment โ partial knowledge gets you the wrong answer.
| Domain | Weight | Questions (~) |
|---|---|---|
| Domain 1: Agentic Architecture & Orchestration | 27% | ~16 |
| Domain 2: Tool Design & MCP Integration | 18% | ~11 |
| Domain 3: Claude Code Configuration & Workflows | 20% | ~12 |
| Domain 4: Prompt Engineering & Structured Output | 20% | ~12 |
| Domain 5: Context Management & Reliability | 15% | ~9 |
Scores are reported on a 100โ1,000 scaled score. The minimum passing score is 720. Scaling accounts for slight variation in difficulty across exam forms โ a 720 on a harder form is equivalent to a 720 on an easier one.
The 6 Exam Scenarios
Every question is grounded in one of six realistic production scenarios. On your exam, 4 of these 6 will appear, selected at random. You don't know which four in advance.
- Scenario 1 โ Customer Support Resolution Agent: Claude Agent SDK with MCP tools (
get_customer,lookup_order,process_refund,escalate_to_human). Target: 80%+ first-contact resolution. Tests agentic loops, tool design, escalation patterns. - Scenario 2 โ Code Generation with Claude Code: Team workflow with CLAUDE.md configs, slash commands, plan mode vs. direct execution. Tests Claude Code configuration and context management.
- Scenario 3 โ Multi-Agent Research System: Coordinator + web search, document analysis, synthesis, and report generation subagents. Tests orchestration, context passing, error propagation.
- Scenario 4 โ Developer Productivity with Claude: Codebase exploration and boilerplate generation using Agent SDK with built-in tools (Read, Write, Bash, Grep, Glob) and MCP servers. Tests tool distribution and Claude Code integration.
- Scenario 5 โ Claude Code for CI/CD: Automated code review, test generation, PR feedback in a CI pipeline. Tests non-interactive mode, structured output, prompt engineering for false-positive reduction.
- Scenario 6 โ Structured Data Extraction: Unstructured document โ validated JSON output. Tests schema design, tool_use, validation-retry loops, batch processing, human review routing.
Domain 1: Agentic Architecture & Orchestration (27%)
The heaviest domain. Seven task statements covering how you design and implement systems where Claude acts autonomously over multiple steps.
What the Exam Actually Tests
- Agentic loop control flow: The loop continues when
stop_reason == "tool_use"and terminates whenstop_reason == "end_turn". Anti-patterns: parsing natural language to detect completion, arbitrary iteration caps as the primary stop mechanism, checking for assistant text content as a completion indicator. - Multi-agent hub-and-spoke: A coordinator manages all inter-agent communication. Subagents do NOT inherit coordinator context automatically โ context must be explicitly passed. Parallel subagents are spawned by emitting multiple
Tasktool calls in a single response. - The
Tasktool requirement: For a coordinator to spawn subagents, itsallowedToolsmust include"Task". This is frequently tested. - Programmatic enforcement vs. prompt guidance: When a specific tool sequence is required for critical business logic (e.g., verifying identity before processing a refund), programmatic prerequisites provide deterministic guarantees. Prompt-based approaches are probabilistic โ insufficient for financial or safety-critical operations.
- SDK hooks (
PostToolUse): Used to intercept and normalize tool results before the model processes them (e.g., converting Unix timestamps and ISO 8601 to a uniform format) or to block policy-violating actions (e.g., refunds over $500) and redirect to escalation. - Session management:
--resume <session-name>continues a named session.fork_sessioncreates independent branches from a shared analysis baseline to explore divergent approaches.
Domain 2: Tool Design & MCP Integration (18%)
How you design tools that Claude actually uses correctly โ and how you configure MCP servers.
What the Exam Actually Tests
- Tool descriptions are the selection mechanism: When two tools have similar names and minimal descriptions, Claude misroutes. The fix is always richer descriptions first โ include input formats, example queries, edge cases, and explicit "when to use this vs. that" guidance.
- Too many tools degrades reliability: 18 tools vs. 4โ5 relevant tools. More tools = more decision complexity = more misuse. Restrict each subagent's tool set to what its role requires.
- Structured error responses: MCP tools should return
isError: truewith anerrorCategory(transient/validation/permission/business), anisRetryableboolean, and a human-readable description. Generic "Operation failed" responses prevent intelligent recovery. tool_choiceoptions:"auto"lets Claude decide,"any"guarantees a tool is called (no conversational text), forced selection ({"type": "tool", "name": "..."}) ensures a specific tool is called first.- MCP server scoping: Project-level tooling goes in
.mcp.json(version-controlled, shared). Personal/experimental servers go in~/.claude.json. Environment variable expansion (${GITHUB_TOKEN}) keeps credentials out of the repo. Both scopes are available simultaneously.
Domain 3: Claude Code Configuration & Workflows (20%)
How you configure Claude Code for team development โ from CLAUDE.md hierarchies to CI/CD integration.
What the Exam Actually Tests
- CLAUDE.md hierarchy: User-level โ project-level โ directory-level. Instructions cascade and can be overridden. Use
@importpatterns to compose configs across files. .claude/rules/with glob patterns: YAML frontmatter specifyingpaths: ["src/api/**/*"]orpaths: ["**/*.test.*"]. Rules load automatically when matching files are edited โ deterministic, not inference-based. This is the right answer for path-specific conventions.- Custom slash commands: Project-scoped commands go in
.claude/commands/(version-controlled, available to all devs on clone). Personal commands go in~/.claude/commands/. - Skills in
.claude/skills/: SKILL.md frontmatter supportscontext: fork(runs in isolation),allowed-tools(restrict what the skill can use), andargument-hint. - Plan mode vs. direct execution: Plan mode for: complex multi-file changes, architectural decisions, multiple valid approaches, monolith migrations. Direct execution for: single-file bug fixes, clear well-scoped tasks. The exam tests when NOT to use plan mode as much as when to use it.
- CI/CD non-interactive mode: The
-p/--printflag makes Claude Code non-interactive (outputs to stdout, exits).--output-format jsonand--json-schemafor structured CI output.CLAUDE_HEADLESS=trueand--batchdon't exist โ common distractors.
Domain 4: Prompt Engineering & Structured Output (20%)
How you reliably extract structured data and design prompts that produce consistent, accurate results.
What the Exam Actually Tests
tool_usefor structured output: Defining a JSON schema as a tool and forcing Claude to "call" it is the most reliable way to get structured output. Combine withtool_choice: "any"to guarantee the tool is called.- Schema design: Required vs. optional fields, nullable fields (prevents hallucination for absent info),
enumwith"other"+ detail string patterns, strict mode for syntax error elimination. - Validation-retry loops: When Pydantic or JSON schema validation fails, send a follow-up request with the document, the failed extraction, and the specific validation error. Track which errors are retryable (format mismatches) vs. not (information genuinely absent from source).
- Few-shot prompting: Targeted examples for ambiguous scenarios. Most effective at reducing false positives (e.g., in code review) and improving handling of structural variety in documents. Few-shot beats vague instructions; it does NOT beat tool descriptions for tool selection issues.
- Message Batches API: 50% cost savings, up to 24-hour processing window, no guaranteed latency SLA. Use
custom_idfields for request/response correlation. Does NOT support multi-turn tool calling within a single request. Right for: overnight reports, weekly audits, nightly test generation. Wrong for: blocking pre-merge checks, anything with latency requirements. - Multi-pass and multi-instance reviews: A model reviewing its own output retains reasoning context and is less likely to catch its own errors. An independent second Claude instance (no prior context) is more effective. For large PRs: per-file local analysis passes + separate cross-file integration pass beats single-pass on all files.
Domain 5: Context Management & Reliability (15%)
How you keep long agentic sessions accurate, coherent, and recoverable when things go wrong.
What the Exam Actually Tests
- "Lost in the middle" effect: Models reliably process information at the beginning and end of long inputs. Key facts and summaries belong at the top. Detailed results with section headers follow.
- Trimming verbose tool outputs: A 40-field order lookup returning all 40 fields to context is wasteful and degrades attention. Extract only relevant fields before they accumulate.
- Progressive summarization risks: Condensing numerical values, percentages, dates, and customer-stated expectations into vague summaries destroys precision. Use a persistent "case facts" block with raw transactional data instead.
- Scratchpad files: For long codebase exploration sessions, agents should maintain scratchpad files recording key findings. Reference them for subsequent questions to counteract context degradation. Use
/compactto reduce context usage when it fills with verbose output. - Escalation triggers: Escalate when (1) the customer explicitly requests a human, (2) policy is ambiguous or silent on the specific request, or (3) the agent cannot make meaningful progress. Do NOT use: sentiment scores, self-reported confidence thresholds, or case complexity proxies. These are unreliable.
- Structured error propagation: Subagents should return structured error context: failure type, attempted query, partial results, alternative approaches. The coordinator decides recovery. Anti-patterns: returning empty results as success (suppresses the error), terminating the full workflow on a single subagent failure.
- Human review routing: Use field-level confidence scores calibrated against labeled validation sets. Stratified random sampling for ongoing error rate measurement. Validate accuracy by document type and field segment โ aggregate accuracy metrics (e.g., "97% overall") can mask poor performance on specific sub-segments.
Sample Questions
These 12 questions are drawn from the official practice exam. Each one is a real indicator of exam difficulty and format.
Scenario: Customer Support Resolution Agentget_customer entirely and calls lookup_order using only the customer's stated name, occasionally leading to misidentified accounts and incorrect refunds. What change would most effectively address this reliability issue?get_customer when users ask about orders (e.g., "check my order #12345"), instead of calling lookup_order. Both tools have minimal descriptions ("Retrieves customer information" / "Retrieves order details") and accept similar identifier formats. What's the most effective first step to improve tool selection reliability?/review slash command that runs your team's standard code review checklist. This command should be available to every developer when they clone or pull the repository. Where should you create this command file?.claude/commands/ within the repository. They are version-controlled and automatically available to all developers on clone/pull. ~/.claude/commands/ is for personal commands. CLAUDE.md is for project instructions, not command definitions. Option D describes a configuration mechanism that doesn't exist.Button.test.tsx next to Button.tsx), and you want all tests to follow the same conventions regardless of location. What's the most maintainable approach?.claude/rules/ with glob patterns (e.g., **/*.test.tsx) allows conventions to be automatically applied based on file paths regardless of directory location. Option B relies on inference, making it unreliable. Option C requires manual skill invocation. Option D can't handle test files spread across many directories since CLAUDE.md files are directory-bound.claude "Analyze this pull request for security issues" but the job hangs indefinitely. Logs indicate Claude Code is waiting for interactive input. What's the correct approach to run Claude Code in an automated pipeline?-p (or --print) flag is the documented way to run Claude Code in non-interactive mode: processes the prompt, outputs to stdout, and exits. Options B and D reference non-existent features. Option C uses a Unix workaround that doesn't properly address Claude Code's command syntax.custom_id. Option D adds unnecessary complexity.What Will NOT Be On the Exam
The official guide explicitly excludes the following โ don't waste study time here:
- Fine-tuning Claude models or training custom models
- Claude API authentication, billing, or account management
- Deploying or hosting MCP servers (infrastructure, networking, containers)
- Claude's internal architecture, training process, or model weights
- Constitutional AI, RLHF, or safety training methodologies
- Embedding models or vector database implementation details
- Computer use (browser automation, desktop interaction)
- Vision/image analysis capabilities
- Streaming API implementation or server-sent events
- Rate limiting, quotas, or API pricing calculations
- OAuth, API key rotation, or authentication protocol details
- Specific cloud provider configurations (AWS, GCP, Azure)
- Performance benchmarking or model comparison metrics
- Prompt caching details (beyond knowing it exists)
- Token counting algorithms or tokenization specifics
The 7-Step Preparation Plan
From the official exam guide โ these are the Anthropic-recommended activities, not speculation:
- Build a complete agentic loop with the Claude Agent SDK. Implement tool calling, error handling, and session management. Practice spawning subagents and passing context between them explicitly.
- Configure Claude Code for a real project. Set up a CLAUDE.md hierarchy, create path-specific rules in
.claude/rules/, build a custom skill withcontext: fork, and integrate at least one MCP server. - Design and test MCP tools. Write descriptions that clearly differentiate similar tools. Implement structured error responses with error categories and retryable flags. Test tool selection reliability with ambiguous requests.
- Build a structured data extraction pipeline. Use
tool_usewith JSON schemas, implement validation-retry loops, design schemas with optional/nullable fields, and practice batch processing with the Message Batches API. - Practice prompt engineering techniques. Write few-shot examples for ambiguous scenarios. Define explicit review criteria to reduce false positives. Design multi-pass review architectures for large code reviews.
- Study context management patterns. Practice extracting structured facts from verbose tool outputs, implementing scratchpad files for long sessions, and designing subagent delegation to manage context limits.
- Review escalation and human-in-the-loop patterns. Understand when to escalate (policy gaps, customer requests, inability to progress) versus resolve autonomously. Practice designing human review workflows with confidence-based routing.
References
- Anthropic โ Claude Certified Architect Foundations Exam Guide (v0.1) โ Official exam guide covering all 5 domains, 30 task statements, 12 sample questions, and the full content outline. Source for all domain weights, task statements, and sample questions in this article. โ link
- Anthropic โ Claude Partner Network Announcement โ Details on the $100M partner network, anchor partners (Accenture, Cognizant, Deloitte, Infosys), and access terms for the first 5,000 partner employees. โ link
-
Anthropic โ Claude Agent SDK Documentation โ Reference for agentic loops, subagent spawning via the
Tasktool,PostToolUsehooks, and session management patterns. โ link -
Anthropic โ Model Context Protocol (MCP) Specification โ Official spec for MCP server configuration, tool and resource interfaces, error handling, and
.mcp.jsonstructure. โ link -
Anthropic โ Claude Code Documentation โ Reference for CLAUDE.md hierarchy,
.claude/rules/glob patterns, custom slash commands, skills frontmatter, plan mode vs. direct execution, and CI/CD integration (-pflag). โ link -
Anthropic โ Message Batches API โ Documentation on the 50% cost savings, 24-hour processing window,
custom_idcorrelation, and limitations (no multi-turn tool calling support). โ link