Shared Contract
Three architectures. One contract. The difference is stage 4. This is the cross-variant reference — the seven-stage spine, eight invariants, the envelope schema that glues the pipeline together, and the stage-4 divergence that makes Cast, Alloy, and Temper distinct while staying fully interchangeable.
Every run walks seven stages
Stages 1–3 and 5–7 are identical across Cast, Alloy, and Temper. Only stage 4 differs. The command files in each variant describe the same agents — Metis at stage 3, Red-Team Trinity at stage 5, Hephaestus at stage 6, Atlas at stage 7.
| Variant | Stage-4 shape |
|---|---|
| Cast | One Prometheus call. Momus audits the single plan. |
| Alloy | N parallel Prometheus-Alloy calls (biased) → Synthesizer blends → Momus audits the blend. |
| Temper | Prometheus-Temper rewrites once per depth; Red-Team Trinity runs at every depth; Momus scores 0–100 per depth; convergence-check.py decides exit. |
Non-negotiable constraints
The commands refuse flags that would break these invariants. Invalid plans are re-looped, not accepted. These rules apply identically to every variant; none can be disabled.
Cast — runs once at stage 5.
Alloy — runs once at stage 5 on the blend.
Temper — runs at every depth of the deepen loop.
Dispatch mechanics are load-bearing: in a single assistant message, emit three Task tool calls (one per adversary). Do not set
run_in_background: true — that breaks the pipeline.Evidence quality rules — empty files are INVALID; "Build succeeded" without the log line is INVALID; screenshots of blank pages are INVALID. No mocks, no test files, no stubs. Ever.
XML prompt — an Opus 4.7 semantic-XML prompt per
_shared/opus-47-xml-schema.md. One-shot, designed to be pasted into a fresh Claude Code session.Plan directory —
plan/plan.md + plan/phase-NN-*.md for humans to review, edit, and share.Both ship. The XML is for machines. The plan directory is for humans.
~/.claude/skills/ and the project's .claude/skills/. Matching skills inject automatically — if the project has a functional-validation skill, Hephaestus uses it; if the user has a scout skill, probe uses it.Injection is name-based: skill names match semantic roles. No config required.
Cast — failure folds as a new Metis directive; Prometheus re-plans.
Alloy — full re-loop through Intent Gate; tournament re-runs with new directives.
Temper — reset
depth = 0; deepen loop re-runs with augmented directives.Default cap is 3 iterations;
--loop flag lifts it to unbounded.Red Team Trinity always fans out — three Task calls in one message.
Alloy's N planners fan out via
xargs -P $(sysctl -n hw.ncpu || nproc).Temper's Red Team fans out even inside the deepen loop.
Sequential execution is a fallback, not a design choice.
--type ultrabrain | deep | quick. The harness maps the category to a model at runtime. Plugins do not hardcode model identifiers like claude-opus-4-7 — they declare category requirements, and the runtime resolves them.This means model upgrades propagate automatically without touching plugin code.
This isolates prompt-format differences (tool-call syntax, reasoning-tag conventions) from the agent's semantic role. Agents are portable; formats are not.
The shared data contract between agents
Every reviewer in Anneal — Metis, Momus, each Red-Team Trinity member, Oracle — returns an envelope in a shared schema. The schema lives in _shared/plan-reviewer-schema.md.
<span class="kw">agent</span>: <span class="var"><agent-name></span> <span class="c"># e.g. "metis", "redteam-security"</span>
<span class="kw">run_id</span>: <span class="var"><run-id></span>
<span class="kw">depth</span>: <span class="var"><int></span> <span class="c"># Temper only; optional</span>
<span class="kw">verdict</span>: <span class="cmd">SAFE</span> | <span class="cmd">CAUTION</span> | <span class="cmd">RISKY</span> | <span class="cmd">BLOCK</span>
<span class="kw">score</span>: <span class="var"><int 0-100></span> <span class="c"># Temper/Momus only; optional</span>
<span class="kw">summary</span>: <span class="var"><string, ≤240 chars></span>
<span class="kw">findings</span>:
- <span class="kw">location</span>: <span class="var"><file-path or section></span>
<span class="kw">severity</span>: <span class="cmd">critical</span> | <span class="cmd">high</span> | <span class="cmd">medium</span> | <span class="cmd">low</span>
<span class="kw">concern</span>: <span class="var"><string></span>
<span class="kw">demand</span>: <span class="var"><imperative sentence></span> <span class="c"># what the planner must do</span>
<span class="kw">directives</span>: <span class="c"># Metis only; imperative sentences</span>
- <span class="var"><string></span>
<span class="kw">clarifying_questions</span>: <span class="c"># BLOCK verdicts only</span>
- <span class="var"><string></span>
<span class="kw">metadata</span>:
<span class="kw">timestamp</span>: <span class="var"><ISO-8601></span>
<span class="kw">reviewer_model</span>: <span class="var"><string></span>
<span class="kw">token_cost</span>: <span class="var"><int></span>Verdict semantics
| Verdict | Gate behavior |
|---|---|
SAFE | Proceed. No concerns. |
CAUTION | Proceed. Record findings; downstream reviewers and Oracle aggregate them. |
RISKY | Proceed with explicit human override at the Oracle stage. |
BLOCK | Do not proceed. If clarifying questions are present, surface and ABORT. Otherwise re-loop with findings folded as new Metis directives. |
BLOCK > RISKY > CAUTION > SAFE. Hephaestus's PASS | FAIL maps to SAFE | BLOCK and participates in the same worst-of aggregation.Atlas computes the final rollup
At stage 7, Atlas aggregates all envelopes from the current iteration into a rollup document. The rollup drives the emission decision: EMIT, RE_LOOP, or ABORT.
<span class="kw">rollup</span>:
<span class="kw">run_id</span>: <span class="var"><run-id></span>
<span class="kw">architecture</span>: <span class="cmd">cast</span> | <span class="cmd">alloy</span> | <span class="cmd">temper</span>
<span class="kw">overall_verdict</span>: <span class="cmd">SAFE</span> | <span class="cmd">CAUTION</span> | <span class="cmd">RISKY</span> | <span class="cmd">BLOCK</span>
<span class="kw">gate_status</span>:
<span class="kw">metis</span>: <span class="cmd">SAFE</span> | <span class="cmd">CAUTION</span> | <span class="cmd">RISKY</span> | <span class="cmd">BLOCK</span>
<span class="kw">momus</span>: <span class="cmd">SAFE</span> | <span class="cmd">CAUTION</span> | <span class="cmd">RISKY</span> | <span class="cmd">BLOCK</span>
<span class="kw">red_team_trinity</span>: <span class="var">"N/3 PASS"</span> <span class="c"># e.g. "3/3 PASS"</span>
<span class="kw">oracle</span>: <span class="cmd">SAFE</span> | <span class="cmd">CAUTION</span> | <span class="cmd">RISKY</span> | <span class="cmd">BLOCK</span>
<span class="kw">hephaestus</span>: <span class="cmd">PASS</span> | <span class="cmd">FAIL</span>
<span class="kw">simultaneous_pass</span>: <span class="var"><bool></span>
<span class="kw">emission_decision</span>: <span class="cmd">EMIT</span> | <span class="cmd">RE_LOOP</span> | <span class="cmd">ABORT</span>
<span class="kw">iteration_count</span>: <span class="var"><int></span>
<span class="c"># variant-specific fields:</span>
<span class="kw">depth_final</span>: <span class="var"><int></span> <span class="c"># Temper only</span>
<span class="kw">depth_scores</span>: <span class="var">[<int>, ...]</span> <span class="c"># Temper only</span>
<span class="kw">convergence_reason</span>: <span class="cmd">variance</span> | <span class="cmd">delta</span> | <span class="cmd">cap</span> <span class="c"># Temper only</span>
<span class="kw">bias_set</span>: <span class="var">[<string>, ...]</span> <span class="c"># Alloy only</span>
<span class="kw">synthesis_provenance</span>: <span class="var"><path></span> <span class="c"># Alloy only</span>Emission logic
<span class="kw">if</span> simultaneous_pass == <span class="cmd">true</span>
<span class="kw">AND</span> overall_verdict <span class="kw">in</span> {<span class="cmd">SAFE</span>, <span class="cmd">CAUTION</span>}:
emission_decision = <span class="cmd">EMIT</span>
<span class="kw">elif</span> overall_verdict == <span class="cmd">BLOCK</span>
<span class="kw">AND</span> Metis.clarifying_questions <span class="kw">is</span> non-empty:
emission_decision = <span class="cmd">ABORT</span> <span class="c"># surface questions, stop</span>
<span class="kw">else</span>:
emission_decision = <span class="cmd">RE_LOOP</span> <span class="c"># fold findings, run again</span>simultaneous_pass: true. Every gate must land green in the same iteration. If the plan drifted between iterations, the drift shows up here as a re-loop trigger.Parallel dispatches are load-bearing
The most common mistake in Anneal pipelines is misunderstanding how parallelism works. Dispatch mechanics appear in every command file for good reason.
run_in_background: true on any of them — that makes them fire-and-forget and breaks the pipeline. The Task tool already executes multiple calls in one message concurrently; that is where the parallelism comes from. Wait for ALL THREE envelope responses before invoking Oracle. No partial reviews.<span class="c"># One message, three Task calls</span> <span class="c"># Runtime executes concurrently</span> <span class="cmd">Task</span>(<span class="kw">agent</span>=<span class="var">"redteam-security"</span>, ...) <span class="cmd">Task</span>(<span class="kw">agent</span>=<span class="var">"redteam-scope"</span>, ...) <span class="cmd">Task</span>(<span class="kw">agent</span>=<span class="var">"redteam-assumptions"</span>, ...) <span class="c"># wait for all three</span> <span class="cmd">Oracle</span>(envelopes=[s, sc, a])
<span class="c"># run_in_background: true breaks pipeline</span>
<span class="c"># dispatches return immediately (no result)</span>
<span class="cmd">Task</span>(<span class="var">"redteam-security"</span>,
<span class="kw">run_in_background</span>=<span class="cmd">true</span>) <span class="c"># ← WRONG</span>
<span class="cmd">Task</span>(<span class="var">"redteam-scope"</span>,
<span class="kw">run_in_background</span>=<span class="cmd">true</span>) <span class="c"># ← WRONG</span>
<span class="c"># Oracle gets empty inputs</span>
<span class="cmd">Oracle</span>() <span class="c"># reports 3/3 PASS — all empty</span>The guard against the wrong pattern is twofold: the explicit dispatch note in every command spec, and the emission gate's simultaneous_pass check. Empty envelopes fail the simultaneous-pass check because they cannot produce a valid verdict — the rollup triggers a re-loop, not an emit, surfacing the pipeline error rather than silently shipping an unreviewed plan.
The same single-message pattern applies to Alloy's N-variant fan-out. The orchestrator uses xargs -P for CLI parallelism at the shell level, but inside the agent execution context, multiple Task calls in one message is the primitive.
Where the variants diverge
Cast — single pass
<span class="c"># Stage 4: single planner, single auditor</span>
<span class="cmd">Prometheus-Cast</span>(task, metis_directives, probe_report)
→ plan.md + phase-*.md
<span class="cmd">Momus</span>(plan) → envelope
<span class="kw">if</span> momus.verdict == <span class="cmd">BLOCK</span>:
fold findings as Metis directive
re-loop once <span class="c"># then escalate</span>Alloy — tournament
<span class="c"># Stage 4: N biased planners run in parallel</span>
bias_set = <span class="cmd">select_biases</span>(N)
<span class="c"># e.g. ["correctness","minimalist","defensive","performance","ux"]</span>
<span class="c"># parallel via xargs -P $(sysctl -n hw.ncpu || nproc)</span>
<span class="kw">for</span> bias <span class="kw">in</span> bias_set:
<span class="cmd">Prometheus-Alloy</span>(task, metis_directives, probe_report,
<span class="kw">bias</span>=bias)
→ variant-{i}-{bias}.md
<span class="c"># wait for all N variants</span>
<span class="cmd">Synthesizer</span>(variants, metis_directives, probe_report)
→ plan.md + phase-*.md + synthesis-provenance.md
<span class="cmd">Momus</span>(plan) → envelope <span class="c"># audits the blend, NOT the variants</span>
<span class="kw">if</span> momus.verdict == <span class="cmd">BLOCK</span>:
regenerate tournament with Momus findings as constraints
max 2 stage-4 re-loops, then escalate to full re-loopTemper — deepen loop
<span class="c"># Stage 4: fixed-point deepen loop</span>
depth = 0
depth_scores = []
<span class="kw">loop</span>:
<span class="kw">if</span> depth == 0:
plan_0 = <span class="cmd">Prometheus-Temper</span>(task, metis_directives, probe_report)
<span class="kw">else</span>:
plan_N = <span class="cmd">Prometheus-Temper</span>(
task, metis_directives, probe_report,
<span class="kw">prior_plan</span>=plan_{N-1},
<span class="kw">prior_momus</span>=momus_envelope_{N-1},
<span class="kw">prior_redteam</span>=redteam_envelopes_{N-1},
<span class="kw">depth_scores</span>=depth_scores
)
<span class="c"># Red Team fans out INSIDE the loop (3 Task calls, one message)</span>
redteam_envelopes_N = [<span class="cmd">redteam-security</span>,
<span class="cmd">redteam-scope</span>,
<span class="cmd">redteam-assumptions</span>](plan_N)
momus_envelope_N = <span class="cmd">Momus</span>(plan_N) <span class="c"># includes score 0-100</span>
depth_scores.append(momus_envelope_N.score)
exit_code = <span class="cmd">convergence-check.py</span>(depth, depth_scores, cap=N)
<span class="kw">if</span> exit_code == 0:
<span class="kw">break</span> <span class="c"># converged — exit is DETERMINISTIC, not LLM-decided</span>
depth += 1
plan_final = plan_N
<span class="c"># On Hephaestus FAIL: reset depth = 0, route back to stage 3</span>convergence-check.py decides — not an LLM. The three exit conditions are: score variance ≤ 3 over last two depths (stable plateau), score delta ≤ 2 between iterations (marginal gain), and depth ≥ cap (hard limit). On Hephaestus FAIL, Temper resets depth = 0 and routes back to stage 3 (Enrich) with the failure folded into Metis directives.Credit where it's due
Anneal's architecture pulls from several sources. None of this was invented from scratch.
| Source | What Anneal borrowed |
|---|---|
| oh-my-openagent | The Greek-god agent taxonomy (Metis, Momus, Oracle, Prometheus, Hephaestus, Atlas). Verdict tiers (SAFE / CAUTION / RISKY / BLOCK) and the parallel-agent review pattern are borrowed wholesale. |
| Aider | Terminal-first ergonomics. Zero-ceremony invocation. Anneal is plan-first rather than edit-first but shares the "just type and go" philosophy. |
| Ralph | The unbounded-re-loop discipline. "The boulder never stops." Anneal's stage-7 simultaneous-pass gate is Ralph-shaped: never emit a partial result, always loop until coherence. |
| SADD (context-engineering-kit) | The primitive vocabulary (launch-sub-agent, do-in-parallel, do-and-judge, tree-of-thoughts) that Temper in particular composes. The deepen loop is SADD's do-and-judge wrapped in a convergence check. |
| ValidationForge | Hephaestus is a ValidationForge runner. The evidence quality rules, the no-mocks mandate, and the preflight discipline all come from VF. |
| multi-agent-consensus | Alloy's tournament is intellectually adjacent to multi-agent-consensus. Where consensus runs three agents as a unanimous gate at execution time, Alloy runs N agents as a consensus-blend at planning time. |
"New variants can be added at stage 4 alone, with no changes to stages 1–3 or 5–7. This is the same philosophy that makes Unix pipelines composable."Shared Contract · Anneal v0.1.0