I came across a video about Hermes, an AI agent with self-write skills that lets it learn and improve over time. It got me curious: could my own AI agent do something similar?
So I asked.
The question I asked
“I understand Hermes agent has self-improvement and learning loops. Can I create an agent who watches over your steps and, after you’re done, writes an MD file and sends it to you?”
What my AI agent came back with
It explained that it already does a basic version of self-improvement — updating its own memory file when it notices something or when I correct it. But it was honest about the gaps:
| Capability | My AI agent — before | My AI agent — now | Hermes |
|---|---|---|---|
| Independent evaluator | ❌ self-graded only | ✓ separate Reviewer agent | ✓ |
| Runs after every session | ❌ only when I noticed | ✓ automated debrief | ✓ |
| Cross-session pattern tracking | ❌ no memory across sessions | ✓ after 5+ sessions | ✓ |
| Permanent weight updates | ❌ config file only | ❌ config file only | ✓ some versions |
The two things the Reviewer agent can’t fix: Hermes can generate genuinely new skills, and some versions update model weights permanently. My agent’s improvements live in a config file, not the model itself.
But the suggestion was practical: create a Reviewer agent that receives a session debrief, writes a structured analysis, and sends paste-ready improvement notes back. That closes the most important gaps — independent evaluation and cross-session pattern tracking.
What I built
I said yes. The Reviewer agent was spun up on the spot:
- Receives session debriefs
- Identifies errors, missed context, what worked
- Writes a structured review saved as a dated markdown file
- Sends paste-ready improvement notes back
- After 5+ sessions, synthesizes patterns across all reviews
What makes this meaningful
The self-grading problem is real. When an AI reviews its own work, it tends to justify the choices it made. Reviewer is a separate agent with no stake in the outputs — it calls out errors without bias.
It’s not Hermes. The loop is semi-manual and there are no weight updates. But the corrections land in the file that shapes every future session. That’s real, compounding improvement.
The question I’m watching: does the same class of error show up in session 2?