Skip to content

Agent Harness

Long-running agent work usually falls apart for a boring reason: the system loses the thread. A harness is how you stop that from happening.

A harness is the small amount of structure around the model that keeps the work legible:

  • persistent task files
  • verification commands
  • clear constraints
  • a repeatable review loop

If the session resets, compacts, or goes sideways, the harness is what lets the next pass pick the work back up without starting from scratch.

You do not need much. For most projects, three files are enough:

The source of truth for what should be built.

  • requirements
  • acceptance criteria
  • out of scope

The executable task list.

  • current step
  • remaining steps
  • blockers
  • verification needed

The compact memory of what already happened.

  • decisions made
  • files touched
  • important constraints
  • unresolved risks

Without a harness, long sessions turn into a pile of stale context, half-finished attempts, and forgotten constraints. With one, the task survives even when the conversation does not.

This matches current harness practice in Anthropic-style long-running agent setups, Codex-style operational harnesses, and open-source agent architecture analyses.

  • Anthropic long-running agents - initializer + coding-agent workflow with durable progress artifacts and incremental commits
  • Codex / AGENTS.md - project instructions and verification commands discovered from the repo itself
  • Cline / implementation plans - structured planning files used before deep execution
  • GitHub Spec Kit / plan.md - project metadata and plan artifacts used to keep agent context aligned
PLAN.md
- [x] Define API contract
- [x] Add tests for auth middleware
- [ ] Implement token refresh flow
- [ ] Update docs
## Current focus
Implement token refresh flow without changing login behavior.
## Verification
- `npm test -- auth`
- `npm run build`
  1. Keep files short enough to reread quickly.
  2. Update the harness when the task changes, not hours later.
  3. Put constraints in writing so the next session cannot forget them.
  4. Store the verification commands next to the plan.
  5. Treat specs and plans like code: review them, tighten them, and keep them current.
  • multi-session feature work
  • long refactors
  • parallel subagent research
  • tasks where verification has several steps
  • small one-file edits
  • typo fixes
  • work you will finish before the context gets messy