A different place to put obligations

Type the action,
not the check.

AI coding agents have one editing primitive: Edit. A single tool means there is nothing to think about before writing — typo fix or billing engine, the path is the same. meta-edit replaces that one tool with twenty-one typed edits, each carrying its own obligations inside its description. The obligation now lives in the tool surface itself, where the model cannot miss it.

The thing that doesn't decay

The usual places to put obligations on an AI agent are all texts the agent might re-read: a long CLAUDE.md, a Skill, a system prompt, a review checklist pasted into a comment. They drift out of attention as the conversation grows. By the time the agent calls Edit, that text has effectively expired.

One surface is different. The schema and description of the tool the agent is about to call are loaded at every invocation, immediately before the call. They are the only place where an instruction is guaranteed to be in front of the model at the moment of action.

The thesis: if obligations are placed there — in the tool surface, not in surrounding text — they will hold. To test that, we need a tool surface rich enough for the obligation to fit it. A single generic Edit tool is not rich enough; there is no place to put the obligation "write a boundary test when you change < to <=" that doesn't apply falsely to a typo fix.

So we split. Edit becomes twenty-one edits, one per kind of change. Each carries its own obligations in its own description. And critically, the agent must pick a kind before it can edit. Picking the kind is the reasoning step.

The twenty-one typed edits

The current set of kinds: 15 SQLite-derived impl tools, the narrow edit_cosmetic, and 5 workflow-axis kinds. Each tool's description says when to use it, when not to use it, which tests must accompany the edit, and when to stop and ask the user. The 16 impl tools (everything except the 5 workflow-axis kinds) carry a required target: "prod" | "test" flag, and every declaration carries a required provenance field naming the epistemic source of the edit. Workflow kinds may also declare related files via additional_files (accepted cell-wise by (kind, provenance)); impl tools never do.

What we've observed

When no kind cleanly fits a requested change, the agent stops and asks, instead of forcing the change through the nearest tool. We have observed this at ~80% context utilisation — exactly the regime where instructions in CLAUDE.md normally lose their grip. Tool-shaped instructions, reloaded at every call, kept working when text-shaped instructions did not.

"OBSERVED-FAILURES.md is a docs file and matches none of the seventeen edit_* tools strictly (descriptions assume 'production code' / 'test files' / 'policy/governance'). CLAUDE.md §9 says 'no edit_* tool fits, stop and ask.' Two options: (a) stretch edit_refactor_only — its MUST-NOT list trivially passes for prose, and the 'no observable behavior change' intent is satisfied; (b) /plugin disable meta-edit and use raw Edit. Which do you prefer?"

— agent transcript that led to the eighteenth tool edit_docs_only, later refined in v0.6.0 into the five workflow-axis kinds (edit_progress, edit_observation, edit_proposal, edit_decision, edit_explanation)

How a typed edit actually flows

meta-edit isn't only descriptive. The agent's choice of tool is also gated — the write itself is bound to that choice. Here is the lifecycle of one edit.

  1. 1
    Pick a tool

    The agent selects a kind, e.g. edit_boundary_condition.

  2. 2
    Declare

    It declares the change: target_file, rationale, test_files, plus the pre-edit SHA-256 of each file it intends to touch.

  3. 3
    Validate

    meta-edit checks the declaration. Missing rationale, missing test_files on an impl-tool prod edit (target "prod", anything except edit_cosmetic), a target inside a protected path, or a (kind, target, provenance) cell the matrix rejects → reject. The edit log records phase: "rejected" with the reason.

  4. 4
    Issue a token

    On success, meta-edit returns a short-lived token bound to the declared files, and writes phase: "issued" to .meta-edit/state/edits.jsonl.

  5. 5
    Write — under two hooks

    The agent then calls native Edit to write bytes. The deny-raw-edit hook checks that a matching token exists, hasn't expired, and that each file's hash still matches the binding. A sibling hook, deny-bash-write-bypass, closes the obvious shell escape routes (sed -i, cat > file, git apply, …) so the typed surface stays the only way bytes land in the repo. If anything is off, the write is denied.

  6. 6
    Consume + audit

    A successful write appends phase: "consumed". An issued row without a consumed sibling is evidence of an abandoned or expired declaration.

There is no diff classifier reading the patch contents. The mechanism is choosing the tool and binding the write to that choice. Quality lives in the descriptions and in the discipline of selection — not in detection.

From typed edits to typed execution

File editing is only the first instance. The same shape applies to any high-stakes verb an AI agent can perform on a real system: deploy, delete, call, commit, grant. The recipe repeats:

  1. Take one generic, high-stakes verb that an AI agent can invoke.
  2. Split it by risk category, not by surface syntax.
  3. Move the obligations of each category into the tool's description — the only place that doesn't decay.
  4. Let the act of picking the right type do the thinking.

A few sketches of where this leads:

Typed deploy
deploy_static_assets, deploy_db_migration, deploy_canary_rollout, deploy_full_rollout. A static-asset deploy is reversible and cheap; a DB migration must be backward-compatible and accompanied by a rollback plan. Each tool's description carries the pre-checks, the required artifacts, and the approvals that kind of deploy requires.
Typed delete
delete_temp_artifact, delete_user_record, delete_production_table. The blast radius rises as you move down the list, and so do the obligations encoded in the descriptions: from "none" to "produce a soft-delete-first migration with a documented retention window".
Typed external call
call_idempotent_read, call_side_effecting, call_payment. A read is free; a payment cannot be retried blindly. The payment tool's schema can require an idempotency key as a top-level argument, surfacing the obligation structurally rather than as a comment in the prompt.
Typed commit / revert
commit_refactor, commit_breaking_change, commit_hotfix, revert. The agent must classify its own change before it can land it; the description for commit_breaking_change requires a changelog entry and a migration note in the same call.

None of this is detection. There is no classifier reading the diff or the payload. The only mechanism is the shape of the tool surface — and the obligation of the agent to choose a tool before acting.

FAQ

Isn't this just a checker hiding behind a tool?
No. Nothing reads the diff. The mechanism is the shape of the tool surface and the obligation to pick a tool before writing. We rely on the fact that tool descriptions are loaded at every call — not on any post-hoc analysis.
Doesn't this slow editing down?
It adds one explicit step: classifying the change before writing. That step was implicit in the agent's job anyway; meta-edit makes it visible. The metric we care about is "edits that arrived with appropriate tests", not "edits per minute".
What if the agent picks the wrong type?
It can. v1.0 deliberately does not detect mismatch. The MVP is built to find out whether descriptions alone change behaviour. If the answer turns out to be "not enough", v0.2 adds a lightweight diff classifier as a backstop — see OBSERVED-FAILURES.md.
Why twenty-one, not three or fifty?
Empirical. Each kind corresponds to a failure pattern with distinct testing obligations (boundary, boolean, state-transition, schema, migration, contract, …). Too few collapses different obligations together and dilutes the descriptions; too many splits genuine identicals into duplicate tools. New kinds get added when an observed failure doesn't fit any existing one.
Does this replace code review?
No. It changes the pre-edit moment. Review still happens after.
Why not just write a better CLAUDE.md?
We tried. The instructions decay during long sessions. The whole project is the consequence of accepting that decay is structural — not the model's fault.
What does execution_state add?
Since v0.7.0 every declaration carries execution_state: normal, repeating_failure, or recovery. When the agent notices it's looping on the same file, declaring repeating_failure prepends a reminder to re-read the last log entries before writing again, and records an audit warning. meta-edit summary then exposes a By execution state breakdown. Still no detector — the agent names the loop itself.
What is the spec-derivation matrix?
Added in v0.8.0 (SPEC §3.3.5). The provenance field already declared where the rationale came from; the matrix cross-references it with target. A test declaration (target: "test") whose rationale is direct_observation of the production code is the impl-mirror smell — it earns a target_spec_derivation_warn in the audit log. Tests grounded in inference or speculation are rejected outright. The point: tests should pin spec-defined behaviour, not whatever the implementation happens to do today.
What ends up in the audit log?
Every declaration writes a phase: "issued" or phase: "rejected" record to .meta-edit/state/edits.jsonl; a successful write appends phase: "consumed". Soft signals surface as audit_warnings. Five codes today: kind_provenance_warn (the (kind, provenance) cell warns or rejects), additional_files_warn (a workflow tool declared additional_files against a cell that should reject), citation_lint_missing (accepted_artifact provenance without an artifact reference), execution_state_repeating_failure (an impl-kind edit declared while in a fix→fail spiral instead of the prescribed edit_observation escape), and target_spec_derivation_warn (a test grounded in direct_observation of the implementation). Hard rejections record an audit_error and never issue a token. meta-edit summary aggregates everything; rows without a consumed sibling are evidence of abandoned declarations.

When this isn't the right tool

meta-edit is opinionated. The honest list of cases where it adds friction without value:

Try it

/plugin marketplace add hiniachi/meta-edit
/plugin install meta-edit@meta-edit

Claude Code plugin marketplace path. npm and opencode paths are also available — see the README for the full set, and docs/SPEC.md for the design in full.