Three Pipelines, One Contract
Cast, Alloy, and Temper share stages 1–3 and 5–7 exactly. Only stage 4 diverges. The eight invariants apply to all three with no exceptions. Swap variants on the same task and the output contract stays identical.
Same seven stages. Stage 4 is the variable.
| Variant | Stage-4 shape | Momus audits |
|---|---|---|
| Cast | One Prometheus call — linear single-planner | The single plan |
| Alloy | N parallel Prometheus-Alloy calls (biased) → Synthesizer blends → one plan | The blend, not the individual variants |
| Temper | Prometheus-Temper rewrites once per depth; Red-Team runs at every depth; convergence-check.py decides exit | One Momus score per depth (0–100); convergence is deterministic |
Three architectures, side by side
Cast — Linear 9-agent pipeline
Alloy — Tournament at stage 4
Temper — Deepen loop at stage 4
Non-negotiable. All three variants. No exceptions.
The commands refuse flags that would break these invariants. Invalid plans are re-looped, not accepted. Refused flags are logged and ignored.
{variant}-{run_id}.xml Opus 4.7 semantic-XML prompt and a plan/ markdown directory. Both ship. XML is for machines; the plan directory is for humans.~/.claude/skills/ and the project's .claude/skills/. Matching skills inject automatically — name-based injection against semantic roles.--loop lifts to unbounded.xargs -P. Temper's Red Team fans out inside the deepen loop. Sequential is a fallback, not a design choice.--type ultrabrain|deep|quick. The harness maps to a model. Plugins declare category requirements; the runtime resolves them. No hardcoded model names.Parallelism comes from batching, not background
The Red-Team Trinity fan-out is load-bearing. In a single assistant message, three Task tool calls are emitted — one per adversary. Do NOT set run_in_background: true on any of them. That makes dispatches fire-and-forget and breaks the pipeline: Oracle gets invoked with empty inputs, and the rollup reports 3/3 PASS with no actual review.
run_in_background: true on red-team dispatches, thinking it improves parallelism. It doesn't. The Task tool already executes multiple calls in one message concurrently. The guard against this is the emission gate's simultaneous_pass check — which fails if envelopes are empty.# CORRECT — three synchronous Task calls in one message # Runtime executes them concurrently [single assistant message]: Task(redteam-security, plan_N) # run_in_background: false (default) Task(redteam-scope, plan_N) # run_in_background: false Task(redteam-assumptions, plan_N)# run_in_background: false # WRONG — fires and forgets; Oracle has no input [single assistant message]: Task(redteam-security, plan_N, run_in_background: true) # ← breaks pipeline Task(redteam-scope, plan_N, run_in_background: true) Task(redteam-assumptions, plan_N,run_in_background: true)
Identical upstream. Comparable downstream.
Keeping stages 1–3 and 5–7 identical across variants means: if Alloy produces a stronger plan than Cast on the same task, the difference is genuinely in the tournament — not in upstream noise from different probe strategies or downstream noise from different red teams.
It also means users can swap variants without relearning. The command signatures are symmetric. The flags mean the same things. Output directories are structurally identical except for variant-specific additions (variants/ for Alloy, depth-history.json for Temper).
And it means new variants can be added at stage 4 alone — a hypothetical anneal-foundry that fine-tunes a local model on the task shape would slot in at stage 4 with no changes to stages 1–3 or 5–7. This is the same philosophy that makes Unix pipelines composable.