Build a Coding Agent from Scratch

Overview & Ground Rules

You're a staff-level software engineer. You can code. You understand architecture, debugging, and shipping software. The gap is that you have no AI/LLM-specific knowledge. This plan bridges that gap using only your existing engineering skills.

Core assumption — You have ~5 hours per week devoted to this. That's roughly 45 minutes on 3-4 evenings and one longer 3-hour block on Saturday. If you have more, accelerate; if less, extend the timeline proportionally. Stick to the plan — don't skip weeks, don't jump between topics.

What You'll Build

By the end of 12 weeks, you'll have a working coding agent — an AI-powered assistant that can read your codebase, propose changes, write patches, and optionally apply them. Along the way, you'll learn the patterns that scale to building a general-purpose agent orchestrator in the mold of Hermes, OpenClaw, or CrewAI — where multiple agents coordinate to solve complex tasks.

By the End, You'll Be Able To:

Design and implement agent architectures — single agent and multi-agent orchestrations
Write effective prompts — the #1 skill in AI development, and something you'll use every day
Integrate LLM APIs — OpenAI, Anthropic, local models via OpenAI-compatible endpoints
Build agents that use tools — file I/O, code execution, web search, databases
Ship a production-ready SaaS — either sell the agent as a service, white-label it, or sell your expertise building agents for clients

🎯 Final Milestone (Week 12)

A deployed agent that can take a feature request, read your codebase, design a solution, write the code, and apply the changes — all with human-in-the-loop approval.

The Three Pillars

Everything you need to learn falls into three buckets. Each week focuses on one pillar, in order:

🧠

Pillar 1: LLM Fundamentals

How models work, what prompts do, tokenization, temperature, contexts, costs, and limitations

🔧

Pillar 2: Agent Patterns

Prompt chains, tool use, function calling, structured output, multi-agent orchestration

🚀

Pillar 3: Ship & Monetize

API design, deployment, UX, pricing models, positioning, and go-to-market strategy

Weeks 1–2: LLM Foundations

Pillar 1

Your goal: understand how LLMs work at a practical level so you can reason about them like a system. No math required. Focus on intuition, engineering patterns, and cost awareness.

Sunday (Evening, 1h): What Is a Large Language Model?

Watch: "The spelled-out intro to LLMs" by 3Blue1Brown on YouTube (3Blue1Brown YouTube). ~2 hours if you want the full series, but the first two videos will give you the core intuition.
Read: "What are Large Language Models?" by Google — a high-level overview of how models are trained, what tokens are, and what they can generate.
Key takeaway: LLMs are autoregressive token predictors — they don't "reason," they predict the next tokens based on context. This shapes everything about how you design agents.
Do: OpenAI's ChatGPT and Claude. Play with the model. Try prompting it on things you'd normally code. Read its output critically. Notice what it gets wrong. This is your first data set.

Tuesday (Evening, 1h): Prompt Engineering Basics

Read: OpenAI's "Prompt Engineering" guide — the official documentation. OpenAI Prompt Guide
Read: Anthropic's "Prompt Engineering" documentation. Anthropic Prompt Guide
Key concepts: System prompts, few-shot examples, chain-of-thought, structured output, temperature, max tokens, stop sequences.
Do: Write 10 prompts for the same task (e.g., "explain a software concept") and compare outputs. Try zero-shot vs. few-shot vs. chain-of-thought. This is your most important hands-on exercise — you're learning the #1 skill in AI development.

Thursday (Evening, 1h): API & Tool Integration

Read: OpenAI API reference. Focus on the chat completions endpoint, message format (system/user/assistant), and response formats. API Reference
Read: OpenAI function calling / tool use docs. Function Calling
Do: Write a Python script that calls the OpenAI API using requests or openai Python SDK. Send a message. Get a response. Log it. This is a real API call — not using ChatGPT directly, but programmatically.

Saturday (3h block): Build a Prompt Runner

Build a Python CLI tool that: takes a prompt from stdin + a model name, calls the OpenAI API, returns the response.
Add support for temperature, max_tokens, system_prompt, and model as arguments.
This is your "Hello World" — a command-line prompt runner. You'll use this throughout the next weeks to experiment before building the full agent.
Deliverable: A working CLI tool. Commit it to GitHub.

Resources for Weeks 1–2:

OpenAI Prompt Engineering Guide (platform.openai.com/docs/guides/prompt-engineering)
Anthropic Prompt Engineering Guide (docs.anthropic.com/prompts)
OpenAI API Quickstart (platform.openai.com/docs/quickstart)
LangChain "Prompting" tutorials — their free courses (python.langchain.com/docs/tutorials)

🎯 Milestone 1 (End of Week 2)

You can call LLM APIs directly, write effective prompts, and understand the limits of what the model can do. You have a CLI tool you'll use throughout.

Weeks 3–4: Prompt Engineering & Structured Output

Pillar 1 → Pillar 2

Your goal: write prompts that produce reliable, structured, reproducible outputs. This is where 80% of AI development happens — prompt writing.

Sunday (Evening, 1h): Advanced Prompt Patterns

Read: "Chain of Thought Prompting" — OpenAI cookbook. OpenAI Techniques
Read: "Structured Output" via JSON mode — OpenAI's structured output format. Structured Outputs
Key concepts: Self-correction, ReAct (Reason + Act), tree of thought, meta-prompting.
Do: Using your prompt runner, experiment with these patterns. Notice which ones produce more reliable code-related outputs.

Tuesday (Evening, 1h): Function Calling Deep Dive

Read: OpenAI's function calling documentation thoroughly. Function Calling Docs
Key concept: Models can call functions you define. They return arguments in a structured format. Your code executes the function and feeds the result back to the model. This is the foundation of agent tool use.
Do: Extend your CLI tool to support function calling. Define a function like read_file(path: str) -> str and have the model call it.

Thursday (Evening, 1h): Structured Output & Validation

Read: Outlines library docs — how to force LLM output into Pydantic models / JSON schemas. Outlines docs
Do: Write a Python function that takes a prompt and a JSON schema, calls the OpenAI API with response_format={"type": "json_schema", ...}, and returns a parsed struct. Test with prompts like "list 5 file patterns to ignore in Python projects" and verify the output matches your schema.
Why this matters: Structured output is how you build agents — the model returns structured data you can use programmatically instead of raw text.

Saturday (3h block): Build a Code Review Agent v1

Build the simplest possible coding agent. It reads a file, sends it to the LLM with instructions like "review this code for bugs, style issues, and improvements," and returns a structured review as JSON.
Deliverable: Python script + function. Accepts a .py or .rs file path from CLI args. Returns JSON with {"bugs": [...], "suggestions": [...], "overall_score": 7}.
Ship it. Commit it. It should work end-to-end.

Resources for Weeks 3–4:

OpenAI Cookbook — function calling & structured output examples (GitHub)
LangChain Function Calling docs (python.langchain.com)
Outlines structured output docs (build-outline.readthedocs.io)
Anthropic's "Building Effective Agents" guide (docs.anthropic.com)

🎯 Milestone 2 (End of Week 4)

You can write prompts that produce structured, tool-driven code outputs. Your code review agent works end-to-end. You understand function calling and structured output.

Weeks 5–6: Your First Real Agent

Pillar 2

Your goal: build an agent that can autonomously use tools (read/write files, execute code) in a loop — the core agent pattern.

Sunday (Evening, 1h): Learn Agent Frameworks

Read: Brief comparisons of agent frameworks. Focus on LangGraph (state-based graph, great for coding agents), OpenAI Agents SDK (minimal, opinionated), and CrewAI (multi-agent orchestration).
Recommendation: Start with LangGraph — it's the most flexible, most documented, and gives you the deepest understanding of agent mechanics because you build the graph yourself rather than hiding behind abstractions.
Why this matters: Frameworks handle the boilerplate (API calls, state management, retry logic, streaming). You'll use one throughout the rest of the plan.

Tuesday (Evening, 1h): LangGraph Quickstart

Read: LangGraph quickstart tutorial. LangGraph docs
Do: Follow the tutorial end-to-end. Build a simple graph with nodes for "user input," "agent," "tool," and "condition" (conditional edge for re-trying).
Key concept: A graph is a set of nodes (functions) connected by edges. Each node has state. Conditional edges decide which node runs next based on the model's output. This is the mental model for every agent.

Thursday (Evening, 1h): Agent with Tool Use

Read: LangGraph agent with tool usage. LangGraph ReAct Agent
Do: Build an agent that can:
- Read any file in a target directory
- Write or modify files in that directory
- Execute code and get the output
This is a Tool-Using Agent — the model decides which tools to call based on the task.

Saturday (3h block): Build a File Organizer Agent

Build an agent that takes a directory path as input, reads all files, and organizes/restructures them based on rules the user provides.
Example: "Move all Python files into a src/ folder, test files into tests/, and create a proper package structure."
Deliverable: A LangGraph agent with file read/write tools that actually restructures a real directory. It reads the directory, the model plans the structure, and the agent executes the moves.
This is your first "real" agent — it observes state, reasons about it, takes action, and handles the loop (read → plan → act → verify).

Why a file organizer, not a code agent?

This is a stepping stone. A file organizer teaches you the full agent loop (observe → plan → act → verify) with files, which is 90% of what a coding agent does. Once this works, the coding agent is just the same pattern with better prompts and more sophisticated tools.

🎯 Milestone 3 (End of Week 6)

You have a working agent that uses tools, reads files, writes files, and executes commands. You understand the LangGraph state machine pattern. You can build any agent from this pattern.

Weeks 7–8: Multi-Agent Orchestration

Pillar 2

Your goal: move from a single agent to a multi-agent system — the kind that powers products like Hermes, OpenClaw, and CrewAI.

Sunday (Evening, 1h): Multi-Agent Architectures

Read: "Multi-Agent Architectures" in OpenAI's guidance docs and CrewAI documentation. Understand the difference between:
- Swarm / sequential — agents pass work to each other in a pipeline
- Supervisor / hierarchical — one agent orchestrates others (like a manager)
- Graph-based — agents are nodes in a graph with conditional routing
Read: LangGraph multi-agent patterns. LangGraph multi-agent

Tuesday (Evening, 1h): Build a Supervisor Agent

Concept: One agent (the supervisor) breaks a task into subtasks, assigns each to a specialized agent, collects the results, and compiles a final output.
Example: User task: "Refactor this module to use async/await." Supervisor agent reads the module, breaks the task into: read files → analyze architecture → write changes → test → review. Each is a sub-agent with its own prompt.
Do: Build this in LangGraph. Two agents: one analyzer, one writer. Supervisor routes between them.

Thursday (Evening, 1h): Agent Communication Patterns

Read: How agents communicate — shared state, message passing, tool outputs. Focus on shared state (simplest, works well with LangGraph).
Read: CrewAI agent communication patterns for comparison. CrewAI docs
Do: Extend your multi-agent system so the analyzer agent writes its findings to shared state, and the writer agent reads them to inform its changes.

Saturday (3h block): Build a Multi-Agent Coding Agent

Build a multi-agent coding agent with at least two roles:
- Planner Agent: reads the codebase, identifies what needs to change, writes a spec
- Builder Agent: takes the spec and writes the actual code
- Reviewer Agent (optional): reviews the changes before applying
Deliverable: Multi-agent LangGraph system. User provides a feature request and a code directory. The planner analyzes, the builder writes, the reviewer approves.
This is core production software — a multi-agent coding assistant.

🎯 Milestone 4 (End of Week 8)

You have a multi-agent system that plans and builds code changes from a feature request. This is the core of products like Cursor, OpenClaw, and Copilot Agents — now in your own codebase.

Weeks 9–10: Tool Integration & Reliability

Pillar 2

Your goal: make the agent production-ready. This is the difference between a toy and a product.

Sunday (Evening, 1h): Advanced Tool Use

Concept: Tools are your agent's hands. More/better tools = more capable agent. Think about what tools a coding agent needs:
- File read/write/directory listing (already done)
- Shell command execution (for running tests, linters)
- Git operations (commit, diff, apply patches — this is critical)
- Web search (for the agent to look up docs, APIs, Stack Overflow)
- Structured output parsing (already done)
Read: LangGraph tool integration patterns. LangGraph Tools

Tuesday (Evening, 1h): Add Git Tool

Implement: A Git tool that can: diff, commit, create branch, show status, show diff — wrapped in a LangGraph tool.
Why this matters: Without git tools, the agent can't safely change code. With them, it can propose changes in a branch, create diffs, and let you review before merging. This is how Cursor/Claude Work operate.
Security: Always require human approval before any tool that modifies files or runs commands. This is "human-in-the-loop" — the agent proposes, you approve.

Thursday (Evening, 1h): Evaluation & Testing Agents

Read: "Evaluating LLM applications" — LangSmith (by LangChain) or Arize Phoenix for agent evaluation. Arize Phoenix
Concept: How do you test an agent? You need regression tests — prompts that should always produce correct results. Track metrics: token cost, response time, tool accuracy, correctness score.
Do: Write 5 test prompts for your coding agent. Each one specifies a code change and a success criterion (e.g., "the agent should write a function that reverses a list in Python. Test: the function exists and is syntactically valid").
Read: "LLM-as-a-Judge" pattern — using a strong model to evaluate another model's output.

Saturday (3h block): Integrate Web Search & Improve Tool Set

Give your agent web search capability (via an API — SearXNG, Tavily, or DuckDuckGo). Now when the agent encounters unfamiliar APIs or concepts, it can look them up.
Test your full pipeline on real codebases. Iterate on prompts until the agent consistently produces usable code changes.
Deliverable: A self-improving coding agent with 6+ tools (file read/write, git, shell, web search, structured output).

🎯 Milestone 5 (End of Week 10)

Your coding agent has real tools, can self-correct using web search, and you can systematically evaluate its outputs. This is no longer a toy — it's a product in beta.

Weeks 11–12: Ship It

Pillar 3

Your goal: deploy a usable product. Whether you sell it as a service, open-source it, or use it as a showcase of your skills — the agent goes live.

Sunday (Evening, 1h): Product Design

Decide your product form. Options:

SaaS: Web app where users submit feature requests and the agent writes the code (like Cursor or Replit Agent)
CLI tool: Command-line agent for developers to run locally (like Copilot CLI or Cursor CLI)
SDK/Library: Embeddable agent framework for other developers to build on (like LangChain or CrewAI)
Service: You use your agent to build coding agents for clients (your own service business)

Recommendation: Start with a CLI tool + API. Deploy the API, write a simple web UI. This gives you the most flexibility — you can later open-source or re-skin for clients.

Tuesday (Evening, 1h): API & Deployment

Build: Expose your agent as an API. FastAPI is the natural choice — same as your ThinkSmart stack. Endpoints: POST /chat (submit a task), GET /task/{id}/status (check progress), GET /task/{id}/result (get output).
Deploy: Deploy to a VPS (Hetzner, Linode, or AWS Lightsail — ~$5-10/month). You're a senior engineer; Docker + nginx on a VPS is well within your comfort zone.

Thursday (Evening, 1h): Frontend / UX

Build: A minimal web interface. Stream the agent's output (like ChatGPT — show tool calls, file changes, reasoning steps in real-time). This is critical UX — users need to see the agent thinking.
Use: FastHTML, Streamlit, or even plain HTML with WebSocket streaming. Your ThinkSmart knowledge of FastAPI + Jinja2 applies directly.
Key UX: Each API call should stream back: the agent's thought process, tool calls, file diffs, and final result. Make this visible and beautiful.

Saturday (3h block): Final Polish & Launch

Finish the agent. Fix any edge cases.
Add .env config, requirements.txt, Dockerfile, README.
Deploy. Test end-to-end with real codebases.
Deliverable: A public URL where anyone can submit a coding task and watch an AI agent work on it. You have a real product.

🎯 Milestone 6 (End of Week 12)

You have a deployed, working coding agent. People can submit tasks to it. You have a product you can sell, open-source, or use as your entry ticket into the AI agent space.

Framework Comparison

Here's a concise comparison of the leading agent frameworks, evaluated for your use case (coding agent, full-time engineer, learning path):

Framework	Language	Best For	Learning Curve	Starter Time
LangGraph	Python / JS	Multi-agent, custom workflows	Medium	~3h to graph
OpenAI Agents SDK	Python / JS	Simple single agents	Easy	~1h to hello-world
CrewAI	Python	Multi-agent orchestration	Easy	~2h to crew
AutoGen (Microsoft)	Python	Conversational multi-agent	Hard	~5h to configure
Google ADK	Python / JS	Google ecosystem apps	Medium	~3h to agent

Recommendation: Start with LangGraph (Python). It gives you maximum control, clear state management, and the deepest understanding of how agents actually work. You'll read less "magic" and more code you understand. If you later want rapid multi-agent setup, CrewAI is a good follow-up — it wraps similar concepts but hides the graph.

Monetization Paths

Once you've built the agent, here's how to make money from it:

Path 1: Sell as a SaaS

Product: Web app where users submit feature requests and the agent generates code
Pricing: $19-99/month per user (like Cursor at $20/mo, Replit Agent at $25/mo)
Positioning: "Your AI coding partner" — compete on UX, speed, and cost-effectiveness
Difficulty: Medium — requires a good UI, payment processing, and scaling infrastructure

Path 2: Sell as a Service

Product: You build coding agents for clients (your own service business)
Pricing: $2,000-10,000 per project (consulting pricing for AI agent development)
Positioning: "We build AI agents for your codebase" — position as a high-end dev shop with AI expertise
Difficulty: Easy — you're the product. Your agent is your proof-of-work.

Path 3: Open Source + Sponsorship

Product: Open-source the agent framework
Monetization: GitHub Sponsors, commercial support, cloud hosting
Difficulty: Hard — requires community building, documentation, and ongoing maintenance

Path 4: Freelance Agent Engineering

Product: You use your agent to build agents for businesses
Pricing: $50-150/hour, or $5,000-50,000 per engagement
Positioning: "Staff engineer who builds AI agents" — rare combination of deep engineering + AI skills
Difficulty: Easiest — leverage your existing senior dev reputation, add AI agent skills as a differentiator

My recommendation: Start with Path 4 (freelance) to validate the market and build credibility. Move to Path 2 (service-based) once you have case studies. Consider Path 1 (SaaS) as a long-term play once the agent is polished enough to productize.

Common Pitfalls to Avoid

❌ Building without prompting first

Don't jump straight into building an agent framework before you've written hundreds of prompts. Prompt writing is the #1 skill in AI development. Spend Weeks 1-4 deliberately on this. Every hour spent on prompts is worth 10 hours spent on code.

❌ Using expensive models for everything

Not every task needs GPT-4o. Use cheaper models (GPT-4o-mini, Claude Haiku, OpenRouter) for simple tasks and reserve the expensive ones for complex reasoning. Track your token costs from Week 1.

❌ Over-engineering the tool set

Start with file read/write and shell execution. Don't add 20 tools upfront. Each tool adds complexity and cost. Add tools based on real agent failures, not hypothetical needs.

❌ Assuming the AI agent will be "right"

Agents make mistakes. They hallucinate, misread code, and produce buggy output. Build in verification at every step — the reviewer agent, syntax validation, test execution. Treat the agent as a junior developer, not a senior one.

❌ Chasing every new framework

Pick LangGraph and stick with it for 8+ weeks. Don't jump to CrewAI, then AutoGen, then LangChain. Master one framework deeply, then you can learn others in days.

Complete Timeline Summary

Weeks	Focus	Deliverable
1–2	LLM Fundamentals & Prompting	Prompt runner CLI tool
3–4	Structured Output & Function Calling	Code review agent
5–6	LangGraph & First Agent	File organizer agent
7–8	Multi-Agent Orchestration	Planner + Builder multi-agent system
9–10	Tools, Tool Use & Reliability	Full coding agent with 6+ tools
11–12	Ship & Monetize	Deployed, public-facing product

The path from engineer to agent builder: 12 weeks, 5 hours/week, 60 hours total. That's less time than most engineering bootcamps. Your existing senior-level software engineering skills mean you can skip 80% of the hard parts (data structures, API design, system architecture, testing). You're adding AI on top of a solid foundation — not building from zero.

Build a Coding Agent from Scratch

Contents click to toggle

Overview & Ground Rules

What You'll Build

By the End, You'll Be Able To:

🎯 Final Milestone (Week 12)

The Three Pillars

Pillar 1: LLM Fundamentals

Pillar 2: Agent Patterns

Pillar 3: Ship & Monetize

Weeks 1–2: LLM Foundations

Sunday (Evening, 1h): What Is a Large Language Model?

Tuesday (Evening, 1h): Prompt Engineering Basics

Thursday (Evening, 1h): API & Tool Integration

Saturday (3h block): Build a Prompt Runner

🎯 Milestone 1 (End of Week 2)

Weeks 3–4: Prompt Engineering & Structured Output

Sunday (Evening, 1h): Advanced Prompt Patterns

Tuesday (Evening, 1h): Function Calling Deep Dive

Thursday (Evening, 1h): Structured Output & Validation

Saturday (3h block): Build a Code Review Agent v1

🎯 Milestone 2 (End of Week 4)

Weeks 5–6: Your First Real Agent

Sunday (Evening, 1h): Learn Agent Frameworks

Tuesday (Evening, 1h): LangGraph Quickstart

Thursday (Evening, 1h): Agent with Tool Use

Saturday (3h block): Build a File Organizer Agent

🎯 Milestone 3 (End of Week 6)

Weeks 7–8: Multi-Agent Orchestration

Sunday (Evening, 1h): Multi-Agent Architectures

Tuesday (Evening, 1h): Build a Supervisor Agent

Thursday (Evening, 1h): Agent Communication Patterns

Saturday (3h block): Build a Multi-Agent Coding Agent

🎯 Milestone 4 (End of Week 8)

Weeks 9–10: Tool Integration & Reliability

Sunday (Evening, 1h): Advanced Tool Use

Tuesday (Evening, 1h): Add Git Tool

Thursday (Evening, 1h): Evaluation & Testing Agents

Saturday (3h block): Integrate Web Search & Improve Tool Set

🎯 Milestone 5 (End of Week 10)

Weeks 11–12: Ship It

Sunday (Evening, 1h): Product Design

Tuesday (Evening, 1h): API & Deployment

Thursday (Evening, 1h): Frontend / UX

Saturday (3h block): Final Polish & Launch

🎯 Milestone 6 (End of Week 12)

Framework Comparison

Monetization Paths

Path 1: Sell as a SaaS

Path 2: Sell as a Service

Path 3: Open Source + Sponsorship

Path 4: Freelance Agent Engineering

Common Pitfalls to Avoid

Complete Timeline Summary