"The agent that grows with you."
862 stars. 135 forks. Three weeks old. Nous Research โ the lab behind the Hermes-3, Nomos, and Psyche model families โ released Hermes Agent on February 26, 2026. It is not a new model. It is not a coding assistant wrapper. It is a self-hosted autonomous agent designed around a single thesis: capability should accumulate at the agent layer, not just the model layer.
The Ephemeral Agent Problem
Every mainstream AI agent today has the same fundamental flaw: it wakes up as a stranger.
You spend an hour explaining your codebase, your preferences, your deployment environment. The session ends. The next day, you start over. The agent doesn't remember that you prefer TypeScript over JavaScript, that your staging server lives at a specific address, or that the last three attempts to deploy via GitHub Actions required a specific environment variable. The context window is not memory โ it's a whiteboard erased at the end of every session.
The industry has explored partial solutions. Long context windows help but add cost and latency proportional to history length. RAG from conversation logs retrieves what was said about a topic but not how to do it. Vector stores of prior conversations are useful but brittle without semantic structure. None of these address the core issue: synthesized, executable, improving procedural knowledge.
What Hermes Actually Is
Hermes Agent is a self-hosted autonomous agent framework with four distinguishing architectural commitments:
Synthesized Skill Documents that capture how to do things, not just what was said.
6 execution backends โ runs on infrastructure, not just your laptop.
Telegram, Discord, Slack, WhatsApp, Signal โ one process, all platforms.
Model trained specifically for tool-calling accuracy and long-range planning.
curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash
hermes setup # configure LLM provider
hermes # start chatting
hermes gateway # start messaging gateway (Telegram, Discord, etc.)
The Closed Learning Loop: From Tasks to Skills
The most technically interesting piece of Hermes is its skill system. When the agent completes a complex task โ deploying a service, debugging a pipeline, running a data analysis workflow โ it synthesizes the experience into a Skill Document: a structured markdown file capturing the procedure, the gotchas, the environment dependencies, and the decision logic.
These are not raw conversation logs. They are synthesized procedural artifacts โ closer to a well-written runbook than a transcript. They follow the agentskills.io open standard, making them portable: any compatible agent can load and execute them.
The key distinction from RAG is procedural vs. declarative memory. A RAG system retrieves what was said about a topic. A Skill Document captures how to do the thing, structured for execution rather than reference. The next time a similar task appears, the agent queries its own skill library first using FTS5 full-text search. The skill improves with each use: edge cases get added, failure modes get documented, environment-specific notes accumulate.
Cross-Session Recall and User Modeling
FTS5 session search with LLM summarization indexes all past sessions so the agent can pull up relevant prior context semantically โ "last time I tried this deployment..." โ without you needing to remember or re-paste it.
Honcho user modeling (from Plastic Labs) builds a structured model of the user across sessions โ preferences, working patterns, technical context, communication style. Updated through interaction, not a static profile. Over months of use, the agent develops a representation of you specifically.
The third layer is agent-curated memory nudges: the agent itself decides when something is worth persisting to long-term memory. Rather than saving everything (noisy) or nothing (current default), it exercises judgment about what constitutes durable knowledge.
Persistent Machine Access: Living on Infrastructure
A major friction point with AI agents is the "execution gap" โ they write code but can't interact with the real world without manual intervention. Hermes closes this with persistent dedicated machine access via six backends:
| Backend | Use Case | Notes |
|---|---|---|
| Local | Direct host machine access | Default for personal setups |
| Docker | Isolated, reproducible containers | Safe code execution without polluting host |
| SSH | Remote servers and cloud instances | Long-running tasks, background processes |
| Singularity | HPC container environments | Research clusters, academic compute |
| Daytona | Serverless persistence | Hibernates when idle, near-zero cost between sessions |
| Modal | Serverless scaling | Heavy workloads that need elastic compute |
The practical implication: start a long-running EDA on a remote server via SSH, disconnect, return later. The agent maintains the terminal state, handles background processes, tracks file system changes independently. It isn't simulating a conversation โ it's managing a workspace.
Daytona and Modal deserve special mention. They offer serverless persistence โ your agent's environment hibernates when idle and wakes on demand, costing nearly nothing between sessions. This makes "run it on a $5 VPS" a genuinely viable deployment model rather than a marketing claim.
The Messaging Gateway
Most technical agents are confined to a CLI or web dashboard. Hermes prioritizes accessibility through a unified messaging gateway: Telegram, Discord, Slack, WhatsApp, Signal, and CLI โ all from a single gateway process.
This enables a continuous feedback loop. Start a task at your workstation. Receive a completion notification via Telegram. Send follow-up instructions from your phone. The agent processes voice memos and executes within its persistent environment. Cross-platform conversation continuity means picking up on Discord where you left off on Telegram.
For engineers who want an agent that's always reachable without maintaining an always-open terminal, this is the right design. Talk to it from anywhere; it works on your infrastructure.
Hermes-3 and Atropos RL
Under the hood is a ReAct loop: Observe โ Reason โ Act. The agent reads terminal output or file contents, analyzes the current state against the goal, then executes a command or calls a tool.
This is powered by Hermes-3 (Llama 3.1-based), trained with Atropos RL โ a reinforcement learning framework developed by Nous Research that specifically targets tool-calling accuracy and long-range planning. The training objective is not general chat quality; it's reliable execution across multi-step workflows without getting "lost." This specificity matters: a model optimized for tool use will outperform a general chat model on agentic tasks even if the general model scores higher on benchmarks.
That said, Hermes is fully model-agnostic. OpenRouter (200+ models), OpenAI, Nous Portal, z.ai/GLM, Kimi/Moonshot, MiniMax, or your own endpoint. Switch with hermes model โ no code changes.
agentskills.io: The Open Standard Play
Hermes skills follow the agentskills.io open standard โ portable SKILL.md files that any compatible agent can load and execute. The Skills Hub at agentskills.io provides a community repository of contributed skills.
This is a meaningful architectural bet. If agentskills.io gains traction as the common format, skills become ecosystem assets rather than proprietary lock-in. An agent trained on your specific workflows via Hermes could share those skills with other compatible agents in your stack. The open standard also means Nous Research can benefit from community-contributed skills without needing to build everything in-house.
Who Should Run Hermes
- AI/ML engineers who want a persistent dev environment that accumulates knowledge about their specific stack
- Self-hosters running workloads on VPS/cloud who want Telegram/Discord-native control without a web dashboard
- Researchers interested in contributing training trajectories back to the Hermes-3 model family via Atropos RL
- Teams who need an agent that maintains state across multi-day projects and long-running background tasks
- Anyone frustrated by starting over every session
Limitations and Open Questions
- Model ceiling: Hermes-3 is Llama 3.1-based. At the frontier, Claude 3.5/4, GPT-4o, and Gemini 2.0 likely outperform it on complex reasoning. Using Hermes with OpenRouter on a stronger model may be the right production choice for demanding tasks.
- Self-improvement without oversight: Skills self-improve during use โ but bad patterns can reinforce. A skill that contains an incorrect shortcut will keep getting used and refined in the wrong direction unless manually corrected.
- Multi-user support: The current architecture appears single-user focused. Teams wanting shared agent memory and skill libraries will need to build their own coordination layer.
- Serverless cold starts: Daytona/Modal serverless backends offer near-zero idle cost but introduce latency on wake. For latency-sensitive workflows, always-on VPS may be preferable.
- Three weeks old: Production stability is unknown. The skill system in particular โ agent-curated nudges, self-improvement โ will take real usage data to validate at scale.
Conclusion
Hermes Agent is the most credible open-source attempt at solving the ephemeral agent problem to date. The architecture is coherent: Skill Documents address the procedural memory gap; persistent machine backends address the execution gap; the messaging gateway addresses the interface gap; Honcho addresses the user modeling gap. These are the right problems.
The bet Nous Research is making is that the next frontier in AI capability isn't just better models โ it's better accumulation. An agent that compounds experience across sessions will outperform a smarter agent that starts fresh every time, at least for the kinds of long-horizon, environment-specific work that defines real engineering workflows.
That thesis is testable. Install it. Give it complex, recurring tasks over several weeks. Let it build skills. See whether the skill library actually improves your productivity or becomes noisy and stale. The answer will tell you a lot about whether this architectural direction is right.
Key Takeaways
1. Skill Documents are the core innovation โ synthesized procedural memory that improves with use. Not chat logs. Not embeddings. Executable runbooks created from experience.
2. Runs on infrastructure, not your laptop โ 6 backends including serverless Daytona/Modal. Start tasks, disconnect, get notified. The agent maintains state.
3. agentskills.io skills are portable โ if you use OpenClaw, your skills are already in the right format. The ecosystems share a standard.
4. Three weeks old, use with that in mind โ the architecture is sound; production stability and skill quality at scale are still being established.