A different place to put obligations
Type the action,
not the check.
AI coding agents have one editing primitive: Edit. A
single tool means there is nothing to think about before writing —
typo fix or billing engine, the path is the same. meta-edit
replaces that one tool with twenty-one typed edits,
each carrying its own obligations inside its description. The
obligation now lives in the tool surface itself, where the model
cannot miss it.
The thing that doesn't decay
The usual places to put obligations on an AI agent are all texts the
agent might re-read: a long CLAUDE.md, a Skill,
a system prompt, a review checklist pasted into a comment. They drift
out of attention as the conversation grows. By the time the agent
calls Edit, that text has effectively expired.
One surface is different. The schema and description of the tool the agent is about to call are loaded at every invocation, immediately before the call. They are the only place where an instruction is guaranteed to be in front of the model at the moment of action.
The thesis: if obligations are placed there — in the
tool surface, not in surrounding text — they will hold. To test that,
we need a tool surface rich enough for the obligation to fit
it. A single generic Edit tool is not rich enough; there
is no place to put the obligation "write a boundary test when you
change < to <=" that doesn't apply
falsely to a typo fix.
So we split. Edit becomes twenty-one edits, one per kind
of change. Each carries its own obligations in its own description.
And critically, the agent must pick a kind before it can
edit. Picking the kind is the reasoning step.
The twenty-one typed edits
The current set of kinds: 15 SQLite-derived impl tools, the
narrow edit_cosmetic, and 5 workflow-axis kinds.
Each tool's description says when to use it, when not to use
it, which tests must accompany the edit, and when to stop and
ask the user. The 16 impl tools (everything except the 5
workflow-axis kinds) carry a required
target: "prod" | "test" flag, and every
declaration carries a required provenance field
naming the epistemic source of the edit. Workflow kinds may
also declare related files via additional_files
(accepted cell-wise by (kind, provenance));
impl tools never do.
- edit_cosmetic
- edit_boundary_condition
- edit_boolean_condition
- edit_state_transition
- edit_db_schema
- edit_data_migration
- edit_api_contract
- edit_serialization
- edit_error_handling
- edit_retry_timeout
- edit_concurrency
- edit_external_side_effect
- edit_cache_invalidation
- edit_permission_logic
- edit_dependency_config
- edit_policy_change
- edit_progress
- edit_observation
- edit_proposal
- edit_decision
- edit_explanation
What we've observed
When no kind cleanly fits a requested change, the agent
stops and asks, instead of forcing the change
through the nearest tool. We have observed this at ~80% context
utilisation — exactly the regime where instructions in
CLAUDE.md normally lose their grip. Tool-shaped
instructions, reloaded at every call, kept working when text-shaped
instructions did not.
"
OBSERVED-FAILURES.mdis a docs file and matches none of the seventeenedit_*tools strictly (descriptions assume 'production code' / 'test files' / 'policy/governance').CLAUDE.md§9 says 'no edit_* tool fits, stop and ask.' Two options: (a) stretchedit_refactor_only— its MUST-NOT list trivially passes for prose, and the 'no observable behavior change' intent is satisfied; (b)/plugin disable meta-editand use rawEdit. Which do you prefer?"
How a typed edit actually flows
meta-edit isn't only descriptive. The agent's choice of tool is also gated — the write itself is bound to that choice. Here is the lifecycle of one edit.
-
1
Pick a tool
The agent selects a kind, e.g.
edit_boundary_condition. -
2
Declare
It declares the change:
target_file,rationale,test_files, plus the pre-edit SHA-256 of each file it intends to touch. -
3
Validate
meta-edit checks the declaration. Missing rationale, missing
test_fileson an impl-tool prod edit (target"prod", anything exceptedit_cosmetic), a target inside a protected path, or a(kind, target, provenance)cell the matrix rejects → reject. The edit log recordsphase: "rejected"with the reason. -
4
Issue a token
On success, meta-edit returns a short-lived token bound to the declared files, and writes
phase: "issued"to.meta-edit/state/edits.jsonl. -
5
Write — under two hooks
The agent then calls native
Editto write bytes. Thedeny-raw-edithook checks that a matching token exists, hasn't expired, and that each file's hash still matches the binding. A sibling hook,deny-bash-write-bypass, closes the obvious shell escape routes (sed -i,cat > file,git apply, …) so the typed surface stays the only way bytes land in the repo. If anything is off, the write is denied. -
6
Consume + audit
A successful write appends
phase: "consumed". Anissuedrow without aconsumedsibling is evidence of an abandoned or expired declaration.
There is no diff classifier reading the patch contents. The mechanism is choosing the tool and binding the write to that choice. Quality lives in the descriptions and in the discipline of selection — not in detection.
From typed edits to typed execution
File editing is only the first instance. The same shape applies to any high-stakes verb an AI agent can perform on a real system: deploy, delete, call, commit, grant. The recipe repeats:
- Take one generic, high-stakes verb that an AI agent can invoke.
- Split it by risk category, not by surface syntax.
- Move the obligations of each category into the tool's description — the only place that doesn't decay.
- Let the act of picking the right type do the thinking.
A few sketches of where this leads:
- Typed deploy
-
deploy_static_assets,deploy_db_migration,deploy_canary_rollout,deploy_full_rollout. A static-asset deploy is reversible and cheap; a DB migration must be backward-compatible and accompanied by a rollback plan. Each tool's description carries the pre-checks, the required artifacts, and the approvals that kind of deploy requires. - Typed delete
-
delete_temp_artifact,delete_user_record,delete_production_table. The blast radius rises as you move down the list, and so do the obligations encoded in the descriptions: from "none" to "produce a soft-delete-first migration with a documented retention window". - Typed external call
-
call_idempotent_read,call_side_effecting,call_payment. A read is free; a payment cannot be retried blindly. The payment tool's schema can require an idempotency key as a top-level argument, surfacing the obligation structurally rather than as a comment in the prompt. - Typed commit / revert
-
commit_refactor,commit_breaking_change,commit_hotfix,revert. The agent must classify its own change before it can land it; the description forcommit_breaking_changerequires a changelog entry and a migration note in the same call.
None of this is detection. There is no classifier reading the diff or the payload. The only mechanism is the shape of the tool surface — and the obligation of the agent to choose a tool before acting.
FAQ
- Isn't this just a checker hiding behind a tool?
- No. Nothing reads the diff. The mechanism is the shape of the tool surface and the obligation to pick a tool before writing. We rely on the fact that tool descriptions are loaded at every call — not on any post-hoc analysis.
- Doesn't this slow editing down?
- It adds one explicit step: classifying the change before writing. That step was implicit in the agent's job anyway; meta-edit makes it visible. The metric we care about is "edits that arrived with appropriate tests", not "edits per minute".
- What if the agent picks the wrong type?
-
It can. v1.0 deliberately does not detect mismatch. The MVP is
built to find out whether descriptions alone change behaviour. If
the answer turns out to be "not enough", v0.2 adds a lightweight
diff classifier as a backstop — see
OBSERVED-FAILURES.md. - Why twenty-one, not three or fifty?
- Empirical. Each kind corresponds to a failure pattern with distinct testing obligations (boundary, boolean, state-transition, schema, migration, contract, …). Too few collapses different obligations together and dilutes the descriptions; too many splits genuine identicals into duplicate tools. New kinds get added when an observed failure doesn't fit any existing one.
- Does this replace code review?
- No. It changes the pre-edit moment. Review still happens after.
-
Why not just write a better
CLAUDE.md? - We tried. The instructions decay during long sessions. The whole project is the consequence of accepting that decay is structural — not the model's fault.
-
What does
execution_stateadd? -
Since v0.7.0 every declaration carries
execution_state:normal,repeating_failure, orrecovery. When the agent notices it's looping on the same file, declaringrepeating_failureprepends a reminder to re-read the last log entries before writing again, and records an audit warning.meta-edit summarythen exposes a By execution state breakdown. Still no detector — the agent names the loop itself. - What is the spec-derivation matrix?
-
Added in v0.8.0 (SPEC §3.3.5). The
provenancefield already declared where the rationale came from; the matrix cross-references it withtarget. A test declaration (target: "test") whose rationale isdirect_observationof the production code is the impl-mirror smell — it earns atarget_spec_derivation_warnin the audit log. Tests grounded ininferenceorspeculationare rejected outright. The point: tests should pin spec-defined behaviour, not whatever the implementation happens to do today. - What ends up in the audit log?
-
Every declaration writes a
phase: "issued"orphase: "rejected"record to.meta-edit/state/edits.jsonl; a successful write appendsphase: "consumed". Soft signals surface asaudit_warnings. Five codes today:kind_provenance_warn(the(kind, provenance)cell warns or rejects),additional_files_warn(a workflow tool declaredadditional_filesagainst a cell that should reject),citation_lint_missing(accepted_artifactprovenance without an artifact reference),execution_state_repeating_failure(an impl-kind edit declared while in a fix→fail spiral instead of the prescribededit_observationescape), andtarget_spec_derivation_warn(a test grounded indirect_observationof the implementation). Hard rejections record anaudit_errorand never issue a token.meta-edit summaryaggregates everything; rows without aconsumedsibling are evidence of abandoned declarations.
When this isn't the right tool
meta-edit is opinionated. The honest list of cases where it adds friction without value:
-
You're writing by hand.
Tool descriptions don't load in your head when you reach for
Edit. meta-edit is built for AI agents. - It's throwaway prototype code. The twenty-one descriptions insist on test obligations. For code you're about to delete, that obligation is dead weight.
- You trust post-hoc review entirely. meta-edit acts before the write. If "review will catch it" already works downstream, this layer is not strictly needed.
- Your codebase has no notion of "kind of change". The categories assume mainstream application development — boundary checks, state transitions, schema changes. Highly homogeneous domains may need a different split.
Try it
/plugin marketplace add hiniachi/meta-edit
/plugin install meta-edit@meta-edit
Claude Code plugin marketplace path. npm and opencode paths are also available — see the README for the full set, and docs/SPEC.md for the design in full.