Skip to main content
Anneal · Reference

Three Pipelines, One Contract

Cast, Alloy, and Temper share stages 1–3 and 5–7 exactly. Only stage 4 diverges. The eight invariants apply to all three with no exceptions. Swap variants on the same task and the output contract stays identical.

Same seven stages. Stage 4 is the variable.

Stage 1
Intent Gate
Classify task Reject unsafe inputs
Stage 2
Probe
Scan codebase Enumerate skills Read docs
Stage 3
Enrich
Metis flags ambiguity + emits directives
Stage 4
Plan
VARIANT-SPECIFIC Cast / Alloy / Temper
Stage 5
Review
Red-Team Trinity → Oracle synthesis
Stage 6
Validate
Hephaestus builds + exercises artifact
Stage 7
Emit
Atlas writes XML + plan dir OR re-loops
VariantStage-4 shapeMomus audits
CastOne Prometheus call — linear single-plannerThe single plan
AlloyN parallel Prometheus-Alloy calls (biased) → Synthesizer blends → one planThe blend, not the individual variants
TemperPrometheus-Temper rewrites once per depth; Red-Team runs at every depth; convergence-check.py decides exitOne Momus score per depth (0–100); convergence is deterministic

Three architectures, side by side

Cast — Linear 9-agent pipeline

Intent GateS1ProbeS2EnrichS3Prometheus-CastS4 · single plannerRed-Team TrinityS5 · parallel ×3OracleS5bHephaestusS6AtlasS7

Alloy — Tournament at stage 4

Intent GateProbeEnrichPrometheusbias=correctnessPrometheusbias=minimalistPrometheusbias=…NSYNTHblend + prov.Red-Team ×3HephaestusAtlas

Temper — Deepen loop at stage 4

Intent GateProbeEnrichdeepen loop (repeat per depth)Prometheus-TemperRed-Team ×3Momusscore 0–100conv-check.pyre-loop if not convergedHephaestus

Non-negotiable. All three variants. No exceptions.

The commands refuse flags that would break these invariants. Invalid plans are re-looped, not accepted. Refused flags are logged and ignored.

Invariant 01
Red Team Trinity always runs
Security · Scope · Assumptions — three parallel adversaries. No flag disables this. Cast runs red team once at stage 5. Alloy runs it once after synthesis. Temper runs it at every depth.
Invariant 02
Functional validation always runs
Hephaestus builds the real artifact in a scratch worktree, captures real build output, and compares against success criteria. No mocks, no stubs, no test files. Ever. Empty evidence files are invalid.
Invariant 03
Dual output — XML + plan directory
Every successful run produces a {variant}-{run_id}.xml Opus 4.7 semantic-XML prompt and a plan/ markdown directory. Both ship. XML is for machines; the plan directory is for humans.
Invariant 04
Skill enrichment
The probe stage scans ~/.claude/skills/ and the project's .claude/skills/. Matching skills inject automatically — name-based injection against semantic roles.
Invariant 05
Unbounded re-loop on FAIL
Failure folds into the next run's constraints. Cast folds as a Metis directive; Alloy re-runs the tournament via Intent Gate; Temper resets depth=0 and re-runs the deepen loop. Default cap: 3 iterations. --loop lifts to unbounded.
Invariant 06
Parallelization by default
Red Team Trinity fans out (three Task calls in one message). Alloy's N planners fan out via xargs -P. Temper's Red Team fans out inside the deepen loop. Sequential is a fallback, not a design choice.
Invariant 07
Category routing, not model picking
The user specifies --type ultrabrain|deep|quick. The harness maps to a model. Plugins declare category requirements; the runtime resolves them. No hardcoded model names.
Invariant 08
Dual prompts by model family
Agents ship Claude-flavored and GPT-flavored prompts in the same file. The runtime picks at dispatch time. This isolates prompt-format differences from the agent's semantic role.

Parallelism comes from batching, not background

The Red-Team Trinity fan-out is load-bearing. In a single assistant message, three Task tool calls are emitted — one per adversary. Do NOT set run_in_background: true on any of them. That makes dispatches fire-and-forget and breaks the pipeline: Oracle gets invoked with empty inputs, and the rollup reports 3/3 PASS with no actual review.

The common mistake: setting run_in_background: true on red-team dispatches, thinking it improves parallelism. It doesn't. The Task tool already executes multiple calls in one message concurrently. The guard against this is the emission gate's simultaneous_pass check — which fails if envelopes are empty.
pseudocodecorrect dispatch pattern
# CORRECT — three synchronous Task calls in one message
# Runtime executes them concurrently
[single assistant message]:
  Task(redteam-security, plan_N)   # run_in_background: false (default)
  Task(redteam-scope, plan_N)      # run_in_background: false
  Task(redteam-assumptions, plan_N)# run_in_background: false

# WRONG — fires and forgets; Oracle has no input
[single assistant message]:
  Task(redteam-security, plan_N,   run_in_background: true)  # ← breaks pipeline
  Task(redteam-scope, plan_N,      run_in_background: true)
  Task(redteam-assumptions, plan_N,run_in_background: true)

Identical upstream. Comparable downstream.

Keeping stages 1–3 and 5–7 identical across variants means: if Alloy produces a stronger plan than Cast on the same task, the difference is genuinely in the tournament — not in upstream noise from different probe strategies or downstream noise from different red teams.

It also means users can swap variants without relearning. The command signatures are symmetric. The flags mean the same things. Output directories are structurally identical except for variant-specific additions (variants/ for Alloy, depth-history.json for Temper).

And it means new variants can be added at stage 4 alone — a hypothetical anneal-foundry that fine-tunes a local model on the task shape would slot in at stage 4 with no changes to stages 1–3 or 5–7. This is the same philosophy that makes Unix pipelines composable.

Three architectures. One contract. The difference is stage 4.anneal · shared-contract