Agent Harness
Long-running agent work usually falls apart for a boring reason: the system loses the thread. A harness is how you stop that from happening.
What a Harness Is
Section titled “What a Harness Is”A harness is the small amount of structure around the model that keeps the work legible:
- persistent task files
- verification commands
- clear constraints
- a repeatable review loop
If the session resets, compacts, or goes sideways, the harness is what lets the next pass pick the work back up without starting from scratch.
The Minimal Harness
Section titled “The Minimal Harness”You do not need much. For most projects, three files are enough:
spec.md
Section titled “spec.md”The source of truth for what should be built.
- requirements
- acceptance criteria
- out of scope
PLAN.md
Section titled “PLAN.md”The executable task list.
- current step
- remaining steps
- blockers
- verification needed
STATE.md
Section titled “STATE.md”The compact memory of what already happened.
- decisions made
- files touched
- important constraints
- unresolved risks
Why This Matters
Section titled “Why This Matters”Without a harness, long sessions turn into a pile of stale context, half-finished attempts, and forgotten constraints. With one, the task survives even when the conversation does not.
This matches current harness practice in Anthropic-style long-running agent setups, Codex-style operational harnesses, and open-source agent architecture analyses.
Examples in the Wild
Section titled “Examples in the Wild”- Anthropic long-running agents - initializer + coding-agent workflow with durable progress artifacts and incremental commits
- Codex / AGENTS.md - project instructions and verification commands discovered from the repo itself
- Cline / implementation plans - structured planning files used before deep execution
- GitHub Spec Kit / plan.md - project metadata and plan artifacts used to keep agent context aligned
Example Pattern
Section titled “Example Pattern”- [x] Define API contract- [x] Add tests for auth middleware- [ ] Implement token refresh flow- [ ] Update docs
## Current focusImplement token refresh flow without changing login behavior.
## Verification- `npm test -- auth`- `npm run build`Rules That Actually Matter
Section titled “Rules That Actually Matter”- Keep files short enough to reread quickly.
- Update the harness when the task changes, not hours later.
- Put constraints in writing so the next session cannot forget them.
- Store the verification commands next to the plan.
- Treat specs and plans like code: review them, tighten them, and keep them current.
When to Use a Harness
Section titled “When to Use a Harness”- multi-session feature work
- long refactors
- parallel subagent research
- tasks where verification has several steps
When You Can Skip It
Section titled “When You Can Skip It”- small one-file edits
- typo fixes
- work you will finish before the context gets messy
Next Steps
Section titled “Next Steps”Supporting Evidence
Section titled “Supporting Evidence”- Anthropic: Effective harnesses for long-running agents
- OpenAI Codex
AGENTS.mdproject-doc implementation - Cline deep-planning
implementation_plan.mdprompt - GitHub Spec Kit agent-context update script
- Context Engineering
- Productivity Research