Introduction
Running an AI agent 24/7 is one of the most powerful productivity setups you can build today. An agent like OpenClaw monitors your messages, checks your email, runs scheduled tasks, spawns sub-agents for research — all while you sleep. But that power comes at a price, and without understanding where your money goes, costs can spiral from a manageable $10/month to a painful $300/month before you notice.
I run OpenClaw around the clock — multiple cron jobs, sub-agents for content creation, research pipelines, social media automation, news digests. My first month's bill was an eye-opener. After three months of optimization, I've cut costs by over 60% while maintaining the same capability. This guide shares everything I've learned.
We'll cover:
- Where the money actually goes — the six cost drivers most people miss
- Provider pricing breakdown — Anthropic, OpenAI, Google, OpenRouter, and local models compared
- Real-world cost scenarios — from casual $5/month to power-user $300/month
- 10 optimization strategies — each with concrete savings estimates
- OpenClaw-specific configuration — the settings that directly affect your bill
Where the Money Goes
Before optimizing, you need to understand the six main cost drivers of running an always-on AI agent. One ClawHosters case study documented a user going from $150/month to $35/month — a 77% reduction — just by understanding and addressing these drivers.[1]
1. Context Accumulation (The #1 Cost Driver)
Every message you send includes the entire conversation history. After a few hours, you could be sending 50,000+ tokens of history with every single request. One user ran /context detail and found 52,000 tokens of history — 40,000 of it from a debug session two days ago.[1]
2. System Prompts (Repeated Every Turn)
OpenClaw assembles your system prompt from AGENTS.md, SOUL.md, skills, tool definitions, and workspace files. The default config allows up to 150,000 characters for workspace files alone.[2] This entire prompt is sent with every single API call. If your system prompt is 15,000 tokens, and you send 100 messages a day, that's 1.5 million input tokens just on system prompts.
3. Heartbeat Polls
OpenClaw sends periodic heartbeat messages to keep the agent alive and responsive. At the default 30-minute interval, that's 48 API calls per day — each one including the full system prompt and recent conversation history. If each heartbeat costs ~$0.01, that's $0.48/day or ~$15/month just on heartbeats.
4. Tool Output Storage
When your agent calls tools (file reads, web scrapes, code execution), the results get stored in conversation history and resent with every subsequent message. A single web scraping job can store 180,000 characters of JSON in the history.[1]
5. Sub-Agent Spawning
Complex tasks spawn sub-agents — each one gets its own system prompt, context, and conversation. A research pipeline that spawns 5 sub-agents effectively multiplies your API cost by 5x for that task.
6. Output Token Multiplier
Output tokens cost 3–8x more than input tokens across all providers.[1] Verbose agent responses compound this — a 2,000-token response on Claude Opus costs $0.05 in output tokens alone. If your agent is naturally chatty, you're paying dearly for every word.
Provider Pricing Breakdown
The model you choose is the single biggest cost lever. Here's the current landscape as of February 2026:
| Model | Input / 1M tokens | Output / 1M tokens | Quality | Speed |
|---|---|---|---|---|
| Claude Opus 4.6 | $5.00 | $25.00 | Highest | Medium |
| Claude Sonnet 4.5 | $3.00 | $15.00 | Very High | Fast |
| Claude Haiku 4.5 | $1.00 | $5.00 | Good | Very Fast |
| GPT-4o | $2.50 | $10.00 | Very High | Fast |
| GPT-4o mini | $0.15 | $0.60 | Good | Very Fast |
| Gemini 2.5 Pro | $1.25 | $10.00 | Very High | Fast |
| Gemini 2.5 Flash | $0.15 | $0.60 | Good | Very Fast |
| DeepSeek V3.2 | $0.27 | $1.10 | Good | Fast |
| MiniMax M2.5 | ~$0.10 | ~$0.40 | Good | Fast |
| Llama 3.3 (local) | Free | Free | Decent | Hardware-dependent |
Anthropic Claude — The Deep Dive
Claude is the most popular model for OpenClaw, and Anthropic offers a critical cost-saving feature: prompt caching. Cached input tokens cost only $0.30/MTok on Sonnet (vs. $3.00 uncached) — a 90% reduction. Since OpenClaw sends the same system prompt every turn, prompt caching alone can cut your bill in half.[4]
OpenAI — Batch API Discount
OpenAI offers a 50% discount on their Batch API for non-urgent tasks. If your agent runs cron jobs that don't need immediate responses (daily summaries, report generation), you can route those through the Batch API and pay half price. GPT-4o mini at $0.075/$0.30 per MTok is extremely cost-effective for simple tasks.
Google Gemini — The Free Tier Champion
Gemini offers a generous free tier: 15 requests per minute on Gemini Flash. For a personal agent with light use, you can genuinely run OpenClaw for $0/month on Gemini Flash. The trade-off is lower quality on complex reasoning tasks and occasional rate limits.[5]
OpenRouter — Model Mixing
OpenRouter acts as an aggregator, giving you access to 100+ models through one API key. The killer feature: you can switch models per-request. Use Kimi K2.5 (free!) for simple tasks and Claude Sonnet for complex ones. Community reports suggest starting free with Kimi K2.5 on OpenRouter to validate your setup, then upgrading as needed.[6]
Local Models — Zero Marginal Cost
Running models locally through Ollama means $0 per token after the hardware investment. A Mac Mini M4 ($599) runs 7B–13B models comfortably. For OpenClaw heartbeats, simple lookups, and basic chat, local models are more than capable. The math: if you'd spend $30/month on API, local hardware pays for itself in 20 months.
Hosting Costs
OpenClaw needs a server running 24/7. The gateway is lightweight — even a Raspberry Pi 4 can handle it. Here's the hosting landscape:[5]
| Provider | Plan | vCPU | RAM | Monthly | Notes |
|---|---|---|---|---|---|
| Oracle Cloud | ARM Flex | 4 | 4 GB | $0 | Free forever (risk of idle deletion) |
| Hetzner | CAX11 | 2 | 4 GB | ~$4 | Most reliable budget option |
| AWS | t4g.small | 2 | 2 GB | ~$12 | 12-month free trial available |
| GCP | e2-small | 2 | 2 GB | ~$12 | Standard cloud pricing |
| Azure | B2s | 2 | 4 GB | ~$30 | Most expensive option |
| Mac Mini M4 | Local | 10 | 16 GB | $0 | $599 one-time + electricity |
Real-World Cost Scenarios
💚 Light Use — $0–10/month
Profile: Personal assistant, casual chat, few commands per day
- 10–50 messages/day
- Model: Gemini Flash (free) or GPT-4o mini ($3–5/month)
- Hosting: Oracle Cloud free tier or local Mac
- No cron jobs, no sub-agents
Monthly total: $0–10
💙 Medium Use — $15–50/month
Profile: Daily productivity, news digests, social media, team assistant
- 50–200 messages/day
- Model: Claude Sonnet ($15–30/month API)
- Hosting: Hetzner VPS ($4/month)
- A few cron jobs, occasional sub-agents
- Web search API ($3–5/month)
Monthly total: $25–50
💜 Heavy Use — $100–300/month
Profile: What I run — multiple cron jobs, research pipeline, sub-agent swarms, content automation
- 200–500+ messages/day including automated tasks
- Model: Claude Opus/Sonnet for main, sub-agents on cheaper models
- Multiple cron jobs running every 30–60 minutes
- Sub-agents for research posts, video generation, social media
- TTS, web search, S3 storage, YouTube API
Monthly total: $100–300 (unoptimized) → $35–80 (optimized)
10 Cost Optimization Strategies
These are ordered by impact — the first three alone can cut your bill by 70%.
1. Enable Prompt Caching (Savings: 40–60%)
Anthropic's prompt caching stores your system prompt server-side and reuses it across requests. Instead of paying $3.00/MTok for your 15K-token system prompt every time, you pay $0.30/MTok for cached reads. Since the system prompt is identical across requests, this is nearly free money.[4]
OpenClaw supports prompt caching natively with Anthropic models — it's enabled by default. Verify it's working by checking your API dashboard for "cache read" vs "cache miss" ratios.
2. Reset Sessions After Tasks (Savings: 40–60%)
The /clear command resets conversation history. One user found they had 73,000 tokens of history for a task that was done hours ago. Since building the habit of clearing after completed tasks, their average cost per request dropped 47%.[1]
3. Model Routing — Use Cheap Models for Simple Tasks (Savings: 50–80%)
This is the highest-impact strategy. Configure model failover in OpenClaw: Sonnet as primary, Haiku as fallback, Opus only for explicit overrides. Sub-agents should default to the cheapest model that can handle the task.[7]
# OpenClaw config example
model: anthropic/claude-sonnet-4-5
model_fallback: anthropic/claude-haiku-4-5
subagent_model: anthropic/claude-haiku-4-5
4. Reduce System Prompt Size (Savings: 10–30%)
Trim your AGENTS.md, SOUL.md, and workspace files. Every 1,000 tokens you remove saves ~$0.003 per API call. At 100 calls/day, that's $9/month saved per 1,000 tokens removed. Audit your system prompt with /context detail and cut anything that isn't essential.[2]
5. Optimize Heartbeat Interval (Savings: $5–15/month)
The default 30-minute heartbeat means 48 API calls/day of "nothing happening." Consider increasing to 60 or 90 minutes if you don't need instant responsiveness. Keep the HEARTBEAT.md file small — every byte is sent as context.
6. Limit Context Window (Savings: 20–40%)
Set contextTokens to 50,000–80,000 instead of the maximum 400,000. OpenClaw automatically compacts older messages when the limit is reached. Most use cases work fine at 80K tokens — that covers 85–90% of tasks.[1]
7. Sub-Agent Model Selection (Savings: 30–60%)
Don't spawn sub-agents on Opus for a web search. Configure sub-agents to use Haiku or GPT-4o mini for routine tasks. Reserve expensive models for tasks that genuinely need them — complex reasoning, nuanced writing, multi-step analysis.[7]
8. Local Inference for Simple Tasks (Savings: Variable)
If you have a GPU rig or Mac with Apple Silicon, route heartbeats and simple lookups through a local model via Ollama. Zero marginal cost for those requests. The quality trade-off is real for complex tasks, but heartbeat checks don't need Opus-level intelligence.
9. Batch API for Non-Urgent Tasks (Savings: 50%)
OpenAI's Batch API offers 50% off for tasks that can wait up to 24 hours. Daily summaries, report generation, content scheduling — anything that doesn't need an immediate response can go through batch processing.
10. Disable Extended Thinking for Simple Tasks (Savings: 10–20%)
Extended thinking (reasoning) generates additional tokens that you pay for. For simple queries, file operations, and routine tasks, disable it. Save thinking for complex coding, analysis, and multi-step reasoning.
# In OpenClaw config
thinking: low # or "off" for simple tasks
# Use /reasoning to toggle during a session
Prompt Caching Deep Dive
Anthropic's prompt caching is the single most impactful cost feature for OpenClaw users. Here's how it works:
- Cache Write: First request with a given system prompt costs 1.25x normal input price (one-time)
- Cache Read: Subsequent requests with the same prefix cost only 0.1x normal input price (90% savings)
- Cache TTL: 5 minutes — as long as you make at least one request every 5 minutes, the cache stays warm
- Minimum Size: 1,024 tokens on Sonnet/Opus, 2,048 tokens on Haiku
With OpenClaw's heartbeat running every 30 minutes, your cache stays warm naturally. Your 15K-token system prompt goes from costing $0.045 per request to $0.0045 — that's the difference between $135/month and $13.50/month on system prompts alone.
Model Routing in Practice
Smart model routing means matching task complexity to model capability. Here's a practical routing table:
| Task Type | Recommended Model | Cost per Call |
|---|---|---|
| Heartbeat checks | Haiku / Local (Ollama) | $0.001–Free |
| Simple file operations | Haiku / GPT-4o mini | $0.001–0.003 |
| Web search summaries | Sonnet / GPT-4o | $0.01–0.03 |
| Content writing | Sonnet | $0.02–0.05 |
| Complex research | Opus / Sonnet | $0.05–0.15 |
| Code debugging | Opus | $0.10–0.30 |
| Sub-agent tasks | Haiku / Sonnet | $0.005–0.03 |
The ZenVanRiel optimization guide reports 50%+ savings from intelligent model routing alone.[7] The key insight: most agent tasks don't require frontier-model intelligence. A quick file lookup doesn't need the same reasoning power as debugging a distributed system.
OpenClaw Configuration for Cost Control
These OpenClaw settings directly affect your bill:
Model Selection
# Primary model for all interactions
model: anthropic/claude-sonnet-4-5
# Fallback when primary is unavailable/rate-limited
model_fallback: anthropic/claude-haiku-4-5
# Model for sub-agents (use cheaper models!)
subagent_model: anthropic/claude-haiku-4-5
Heartbeat Interval
# Default is 30 minutes. Increase to save tokens.
heartbeat_interval: 60 # minutes
# Keep HEARTBEAT.md small!
# Every byte is sent as context with each heartbeat
Context Limits
# Limit context window to control costs
contextTokens: 80000 # default can be up to 400K
# Compaction happens automatically when limit is reached
Thinking/Reasoning
# Disable extended thinking for simple tasks
thinking: low # options: off, low, medium, high
# Toggle during session with /reasoning command
Built-in Cost Tracking
Use /status to see your current token usage. Monitor your API provider's dashboard for detailed cost breakdowns. Set budget alerts at your provider (Anthropic, OpenAI) to catch runaway costs early.
How to Audit Your Costs
- Check /status — OpenClaw shows token usage for the current session
- API Dashboard — Anthropic Console, OpenAI Dashboard, or OpenRouter show historical usage and costs
- Set Budget Alerts — Most providers support email/webhook alerts at spending thresholds
- Monitor context size — Run
/context detailto see what's eating your tokens - Track per-feature costs — Separate API keys for main agent vs sub-agents to attribute costs
What the Community Says
Real Cost Reports
- "$187 for my first month just playing around" — common experience for users who don't optimize[1]
- "I run production instances for under $35/month" — after applying optimization strategies[1]
- "$5–10/month with Gemini Flash free tier + Hetzner" — the ultra-budget setup[5]
- "95% cost reduction by switching to MiniMax M2.5" — aggressive model downgrade with minimal quality loss[8]
Community Consensus
- Claude Sonnet 4.5 is the sweet spot for quality vs. cost
- Prompt caching is non-negotiable — enable it immediately
- Session resets (
/clear) are the easiest win most people miss - Local models via Ollama are great for heartbeats and simple tasks
- Oracle Cloud free tier is the best hosting deal (with PAYG upgrade for safety)
- Tool-calling reliability matters more than raw intelligence for agent tasks[6]
Hacker News Perspective
The HN community emphasizes cost guardrails for autonomous agents. Projects like AgentGuard auto-kill agents before they burn through budgets. The consensus: always set hard spending limits on API keys, especially for autonomous agents that run unsupervised.
References
- Cut OpenClaw Token Costs by 77% — ClawHosters, February 2026
- OpenClaw Context & Token Management — Official Docs
- OpenClaw Pricing: How Much Does It Actually Cost? — The CAIO, February 2026
- Prompt Caching — Anthropic Documentation
- OpenClaw Deploy Cost Guide: $0-8/month — WenHao Yu, February 2026
- How to Set Up OpenClaw: Your 24/7 AI Agent — MrPrompts, Substack
- OpenClaw API Cost Optimization: Smart Model Routing — Zen Van Riel, February 2026
- Cut OpenClaw Costs by 95% — Daily Dose of Data Science, February 2026
- A Realistic Guide to OpenClaw AI Pricing — Eesel AI
- AI Model Pricing for OpenClaw Agents — ClawKit, 2026
- AgentGuard — Auto-kill AI agents before budget burn — GitHub
- OpenClaw AI Cost Analysis: Deployment Path Comparison — TrendFingers