๐Ÿ“บ Watch the video version: ThinkSmart.Life/youtube
๐ŸŽง
Listen to this article

What Is the Claude Certified Architect?

Anthropic just launched its first-ever technical certification โ€” and backed it with one hundred million dollars. The Claude Certified Architect (CCA) โ€” Foundations exam is a 60-question, proctored test designed for solution architects who build production AI applications with Claude.

This isn't a conceptual badge you earn by watching videos. The CCA validates practical judgment: knowing when to use programmatic enforcement vs. prompt guidance, how agentic loops should handle stop_reason, how to write MCP tool descriptions that actually work, and how to manage context in multi-agent systems that don't fall apart under load.

The target candidate has 6+ months of hands-on experience building with the Claude API, Agent SDK, Claude Code, and MCP. They've built real agentic systems and made real architectural tradeoffs. The exam tests whether they made the right ones โ€” or whether they just got lucky.

๐Ÿ“‹ Exam At a Glance 60 multiple-choice questions ยท 4 scenarios (drawn from 6) ยท 720/1000 minimum passing score ยท Closed-book, proctored ยท No penalty for guessing

The $100M Partner Network Behind It

The CCA certification launched alongside Anthropic's Claude Partner Network โ€” a $100 million commitment for 2026 targeting enterprise system integrators. This is not a grassroots community push. It's a deliberate credential-as-moat strategy.

The anchor partners are telling: Accenture is training 30,000 professionals, Cognizant is giving 350,000 associates access, Deloitte and Infosys round out the founding cohort. The first 5,000 partner employees get access free.

The bet is straightforward: enterprise buyers want certified practitioners. If Anthropic controls the credential, they influence who gets hired for Claude deployments โ€” which drives more Claude deployments. AWS did this with its certifications in the early cloud era. Anthropic is running the same playbook.

For individual practitioners, the timing matters. Credentials have the most value when supply is low. The CCA cohort is small right now. That changes fast.

Exam Format & Scoring

The exam uses scenario-based multiple choice: each question has one correct answer and three distractors. Distractors are specifically designed for candidates who know the concepts but lack practical judgment โ€” partial knowledge gets you the wrong answer.

DomainWeightQuestions (~)
Domain 1: Agentic Architecture & Orchestration27%~16
Domain 2: Tool Design & MCP Integration18%~11
Domain 3: Claude Code Configuration & Workflows20%~12
Domain 4: Prompt Engineering & Structured Output20%~12
Domain 5: Context Management & Reliability15%~9

Scores are reported on a 100โ€“1,000 scaled score. The minimum passing score is 720. Scaling accounts for slight variation in difficulty across exam forms โ€” a 720 on a harder form is equivalent to a 720 on an easier one.

The 6 Exam Scenarios

Every question is grounded in one of six realistic production scenarios. On your exam, 4 of these 6 will appear, selected at random. You don't know which four in advance.

โš ๏ธ Strategy Note You can't predict which 4 scenarios appear. Study all 6. Domains 1 (Agentic) and 2 (Tool Design) appear in at least 4 of the 6 scenarios โ€” highest ROI for study time.

Domain 1: Agentic Architecture & Orchestration (27%)

The heaviest domain. Seven task statements covering how you design and implement systems where Claude acts autonomously over multiple steps.

What the Exam Actually Tests

๐Ÿ’ก Core Principle of Domain 1 Prompt instructions have a non-zero failure rate. Any compliance requirement with financial, legal, or safety consequences must be enforced programmatically โ€” not via prompt.

Domain 2: Tool Design & MCP Integration (18%)

How you design tools that Claude actually uses correctly โ€” and how you configure MCP servers.

What the Exam Actually Tests

Domain 3: Claude Code Configuration & Workflows (20%)

How you configure Claude Code for team development โ€” from CLAUDE.md hierarchies to CI/CD integration.

What the Exam Actually Tests

Domain 4: Prompt Engineering & Structured Output (20%)

How you reliably extract structured data and design prompts that produce consistent, accurate results.

What the Exam Actually Tests

Domain 5: Context Management & Reliability (15%)

How you keep long agentic sessions accurate, coherent, and recoverable when things go wrong.

What the Exam Actually Tests

Sample Questions

These 12 questions are drawn from the official practice exam. Each one is a real indicator of exam difficulty and format.

Scenario: Customer Support Resolution Agent
Question 1
Production data shows that in 12% of cases, your agent skips get_customer entirely and calls lookup_order using only the customer's stated name, occasionally leading to misidentified accounts and incorrect refunds. What change would most effectively address this reliability issue?
  • A) Add a programmatic prerequisite that blocks lookup_order and process_refund calls until get_customer has returned a verified customer ID.
  • B) Enhance the system prompt to state that customer verification via get_customer is mandatory before any order operations.
  • C) Add few-shot examples showing the agent always calling get_customer first, even when customers volunteer order details.
  • D) Implement a routing classifier that analyzes each request and enables only the subset of tools appropriate for that request type.
  • โœ“ Correct Answer: A
When a specific tool sequence is required for critical business logic, programmatic enforcement provides deterministic guarantees that prompt-based approaches cannot. Options B and C rely on probabilistic LLM compliance โ€” insufficient when errors have financial consequences. Option D addresses tool availability, not tool ordering.
Question 2
Production logs show the agent frequently calls get_customer when users ask about orders (e.g., "check my order #12345"), instead of calling lookup_order. Both tools have minimal descriptions ("Retrieves customer information" / "Retrieves order details") and accept similar identifier formats. What's the most effective first step to improve tool selection reliability?
  • A) Add few-shot examples to the system prompt demonstrating correct tool selection patterns, with 5โ€“8 examples showing order-related queries routing to lookup_order.
  • โœ“ B) Expand each tool's description to include input formats it handles, example queries, edge cases, and boundaries explaining when to use it versus similar tools.
  • C) Implement a routing layer that parses user input before each turn and pre-selects the appropriate tool based on detected keywords and identifier patterns.
  • D) Consolidate both tools into a single lookup_entity tool that accepts any identifier and internally determines which backend to query.
Tool descriptions are the primary mechanism LLMs use for tool selection. When descriptions are minimal, models lack context to differentiate similar tools. Option B directly addresses the root cause. Few-shot examples (A) add token overhead without fixing the underlying issue. A routing layer (C) is over-engineered. Consolidating tools (D) is a valid architectural choice but requires more effort than a "first step" warrants.
Question 3
Your agent achieves 55% first-contact resolution, well below the 80% target. Logs show it escalates straightforward cases (standard damage replacements with photo evidence) while attempting to autonomously handle complex situations requiring policy exceptions. What's the most effective way to improve escalation calibration?
  • โœ“ A) Add explicit escalation criteria to your system prompt with few-shot examples demonstrating when to escalate versus resolve autonomously.
  • B) Have the agent self-report a confidence score (1โ€“10) before each response and automatically route requests to humans when confidence falls below a threshold.
  • C) Deploy a separate classifier model trained on historical tickets to predict which requests need escalation before the main agent begins processing.
  • D) Use sentiment analysis to detect customer frustration levels and automatically escalate when negative sentiment exceeds a threshold.
Explicit criteria with few-shot examples directly addresses unclear decision boundaries. Option B fails because LLM self-reported confidence is poorly calibrated. Option C is over-engineered โ€” prompt optimization hasn't been tried yet. Option D solves a different problem entirely; sentiment doesn't correlate with case complexity.
Scenario: Code Generation with Claude Code
Question 4
You want to create a custom /review slash command that runs your team's standard code review checklist. This command should be available to every developer when they clone or pull the repository. Where should you create this command file?
  • โœ“ A) In the .claude/commands/ directory in the project repository
  • B) In ~/.claude/commands/ in each developer's home directory
  • C) In the CLAUDE.md file at the project root
  • D) In a .claude/config.json file with a commands array
Project-scoped custom slash commands are stored in .claude/commands/ within the repository. They are version-controlled and automatically available to all developers on clone/pull. ~/.claude/commands/ is for personal commands. CLAUDE.md is for project instructions, not command definitions. Option D describes a configuration mechanism that doesn't exist.
Question 5
You've been assigned to restructure the team's monolithic application into microservices. This will involve changes across dozens of files and requires decisions about service boundaries and module dependencies. Which approach should you take?
  • โœ“ A) Enter plan mode to explore the codebase, understand dependencies, and design an implementation approach before making changes.
  • B) Start with direct execution and make changes incrementally, letting the implementation reveal the natural service boundaries.
  • C) Use direct execution with comprehensive upfront instructions detailing exactly how each service should be structured.
  • D) Begin in direct execution mode and only switch to plan mode if you encounter unexpected complexity during implementation.
Plan mode is designed for complex tasks involving large-scale changes, multiple valid approaches, and architectural decisions. Option B risks costly rework when dependencies are discovered late. Option C assumes you already know the right structure without exploration. Option D ignores that the complexity is already stated in the requirements.
Question 6
Your codebase has distinct areas with different coding conventions: React components (functional with hooks), API handlers (async/await with specific error handling), and database models (repository pattern). Test files are spread throughout alongside the code they test (e.g., Button.test.tsx next to Button.tsx), and you want all tests to follow the same conventions regardless of location. What's the most maintainable approach?
  • โœ“ A) Create rule files in .claude/rules/ with YAML frontmatter specifying glob patterns to conditionally apply conventions based on file paths.
  • B) Consolidate all conventions in the root CLAUDE.md file under headers for each area, relying on Claude to infer which section applies.
  • C) Create skills in .claude/skills/ for each code type that include the relevant conventions in their SKILL.md files.
  • D) Place a separate CLAUDE.md file in each subdirectory containing that area's specific conventions.
.claude/rules/ with glob patterns (e.g., **/*.test.tsx) allows conventions to be automatically applied based on file paths regardless of directory location. Option B relies on inference, making it unreliable. Option C requires manual skill invocation. Option D can't handle test files spread across many directories since CLAUDE.md files are directory-bound.
Scenario: Multi-Agent Research System
Question 7
After running the system on "impact of AI on creative industries," you observe all subagents complete successfully, but the final reports cover only visual arts, completely missing music, writing, and film. The coordinator's logs show it decomposed the topic into three subtasks: "AI in digital art creation," "AI in graphic design," and "AI in photography." What is the most likely root cause?
  • A) The synthesis agent lacks instructions for identifying coverage gaps in the findings it receives from other agents.
  • โœ“ B) The coordinator agent's task decomposition is too narrow, resulting in subagent assignments that don't cover all relevant domains of the topic.
  • C) The web search agent's queries are not comprehensive enough and need to be expanded to cover more creative industry sectors.
  • D) The document analysis agent is filtering out sources related to non-visual creative industries due to overly restrictive relevance criteria.
The coordinator's logs reveal the root cause directly: it decomposed "creative industries" into only visual arts subtasks. The subagents executed their assigned tasks correctly โ€” the problem is what they were assigned. Options A, C, and D incorrectly blame downstream agents that are working correctly within their assigned scope.
Question 8
The web search subagent times out while researching a complex topic. You need to design how this failure information flows back to the coordinator agent. Which error propagation approach best enables intelligent recovery?
  • โœ“ A) Return structured error context to the coordinator including the failure type, the attempted query, any partial results, and potential alternative approaches.
  • B) Implement automatic retry logic with exponential backoff within the subagent, returning a generic "search unavailable" status only after all retries are exhausted.
  • C) Catch the timeout within the subagent and return an empty result set marked as successful.
  • D) Propagate the timeout exception directly to a top-level handler that terminates the entire research workflow.
Structured error context gives the coordinator the information it needs to make intelligent recovery decisions. Option B's generic status hides valuable context. Option C suppresses the error entirely, preventing any recovery and risking incomplete research. Option D terminates the entire workflow unnecessarily when recovery strategies could succeed.
Question 9
During testing, you observe the synthesis agent frequently needs to verify specific claims. Currently this adds 2โ€“3 round trips and increases latency by 40%. Evaluation shows 85% of verifications are simple fact-checks (dates, names, statistics) while 15% require deeper investigation. What's the most effective approach to reduce overhead while maintaining reliability?
  • โœ“ A) Give the synthesis agent a scoped verify_fact tool for simple lookups, while complex verifications continue delegating to the web search agent through the coordinator.
  • B) Have the synthesis agent accumulate all verification needs and return them as a batch to the coordinator at the end of its pass.
  • C) Give the synthesis agent access to all web search tools so it can handle any verification independently.
  • D) Pre-cache likely verification targets before synthesis begins, based on predicted claim types from the research queries.
Option A applies least-privilege: the synthesis agent gets only what it needs for the 85% common case while preserving the existing coordination pattern for complex cases. Option B's batching creates blocking dependencies since synthesis steps may depend on earlier verified facts. Option C over-provisions the synthesis agent. Option D relies on speculative caching that can't reliably predict verification needs.
Scenario: Claude Code for Continuous Integration
Question 10
Your pipeline script runs claude "Analyze this pull request for security issues" but the job hangs indefinitely. Logs indicate Claude Code is waiting for interactive input. What's the correct approach to run Claude Code in an automated pipeline?
  • โœ“ A) Add the -p flag: claude -p "Analyze this pull request for security issues"
  • B) Set the environment variable CLAUDE_HEADLESS=true before running the command
  • C) Redirect stdin from /dev/null: claude "Analyze this pull request for security issues" < /dev/null
  • D) Add the --batch flag: claude --batch "Analyze this pull request for security issues"
The -p (or --print) flag is the documented way to run Claude Code in non-interactive mode: processes the prompt, outputs to stdout, and exits. Options B and D reference non-existent features. Option C uses a Unix workaround that doesn't properly address Claude Code's command syntax.
Question 11
Your team wants to reduce API costs. Two workflows currently use real-time Claude calls: (1) a blocking pre-merge check that must complete before developers can merge, and (2) a technical debt report generated overnight for review the next morning. Your manager proposes switching both to the Message Batches API for its 50% cost savings. How should you evaluate this proposal?
  • โœ“ A) Use batch processing for the technical debt reports only; keep real-time calls for pre-merge checks.
  • B) Switch both workflows to batch processing with status polling to check for completion.
  • C) Keep real-time calls for both workflows to avoid batch result ordering issues.
  • D) Switch both to batch processing with a timeout fallback to real-time if batches take too long.
The Batches API offers 50% savings but has up to 24-hour processing times with no guaranteed latency SLA โ€” unsuitable for blocking pre-merge checks, ideal for overnight jobs. Option B relies on "often faster" completion, which isn't acceptable for blocking workflows. Option C reflects a misconception โ€” batch results can be correlated using custom_id. Option D adds unnecessary complexity.
Question 12
A pull request modifies 14 files across the stock tracking module. Your single-pass review produces inconsistent results: detailed feedback for some files but superficial comments for others, obvious bugs missed, and contradictory feedback โ€” flagging a pattern as problematic in one file while approving identical code elsewhere. How should you restructure the review?
  • โœ“ A) Split into focused passes: analyze each file individually for local issues, then run a separate integration-focused pass examining cross-file data flow.
  • B) Require developers to split large PRs into smaller submissions of 3โ€“4 files before automated review runs.
  • C) Switch to a higher-tier model with a larger context window to give all 14 files adequate attention in one pass.
  • D) Run three independent review passes on the full PR and only flag issues that appear in at least two of the three runs.
Splitting reviews into focused passes directly addresses attention dilution when processing many files at once. Option B shifts burden to developers without improving the system. Option C misunderstands that larger context windows don't solve attention quality issues. Option D would suppress detection of real bugs by requiring consensus on issues that may only be caught intermittently.

What Will NOT Be On the Exam

The official guide explicitly excludes the following โ€” don't waste study time here:

The 7-Step Preparation Plan

From the official exam guide โ€” these are the Anthropic-recommended activities, not speculation:

  1. Build a complete agentic loop with the Claude Agent SDK. Implement tool calling, error handling, and session management. Practice spawning subagents and passing context between them explicitly.
  2. Configure Claude Code for a real project. Set up a CLAUDE.md hierarchy, create path-specific rules in .claude/rules/, build a custom skill with context: fork, and integrate at least one MCP server.
  3. Design and test MCP tools. Write descriptions that clearly differentiate similar tools. Implement structured error responses with error categories and retryable flags. Test tool selection reliability with ambiguous requests.
  4. Build a structured data extraction pipeline. Use tool_use with JSON schemas, implement validation-retry loops, design schemas with optional/nullable fields, and practice batch processing with the Message Batches API.
  5. Practice prompt engineering techniques. Write few-shot examples for ambiguous scenarios. Define explicit review criteria to reduce false positives. Design multi-pass review architectures for large code reviews.
  6. Study context management patterns. Practice extracting structured facts from verbose tool outputs, implementing scratchpad files for long sessions, and designing subagent delegation to manage context limits.
  7. Review escalation and human-in-the-loop patterns. Understand when to escalate (policy gaps, customer requests, inability to progress) versus resolve autonomously. Practice designing human review workflows with confidence-based routing.
โœ… Access the Exam Register at anthropic.skilljar.com. First 5,000 partner employees get free access. Practice exam link is provided separately after registration.

References

  1. Anthropic โ€” Claude Certified Architect Foundations Exam Guide (v0.1) โ€” Official exam guide covering all 5 domains, 30 task statements, 12 sample questions, and the full content outline. Source for all domain weights, task statements, and sample questions in this article. โ†— link
  2. Anthropic โ€” Claude Partner Network Announcement โ€” Details on the $100M partner network, anchor partners (Accenture, Cognizant, Deloitte, Infosys), and access terms for the first 5,000 partner employees. โ†— link
  3. Anthropic โ€” Claude Agent SDK Documentation โ€” Reference for agentic loops, subagent spawning via the Task tool, PostToolUse hooks, and session management patterns. โ†— link
  4. Anthropic โ€” Model Context Protocol (MCP) Specification โ€” Official spec for MCP server configuration, tool and resource interfaces, error handling, and .mcp.json structure. โ†— link
  5. Anthropic โ€” Claude Code Documentation โ€” Reference for CLAUDE.md hierarchy, .claude/rules/ glob patterns, custom slash commands, skills frontmatter, plan mode vs. direct execution, and CI/CD integration (-p flag). โ†— link
  6. Anthropic โ€” Message Batches API โ€” Documentation on the 50% cost savings, 24-hour processing window, custom_id correlation, and limitations (no multi-turn tool calling support). โ†— link