Anneal · Reference

Three Pipelines, One Contract

Cast, Alloy, and Temper share stages 1–3 and 5–7 exactly. Only stage 4 diverges. The eight invariants apply to all three with no exceptions. Swap variants on the same task and the output contract stays identical.

01 The Seven-Stage Spine

Same seven stages. Stage 4 is the variable.

Stage 1

Intent Gate

Classify task Reject unsafe inputs

Stage 2

Probe

Scan codebase Enumerate skills Read docs

Stage 3

Enrich

Metis flags ambiguity + emits directives

Stage 4

Plan

VARIANT-SPECIFIC Cast / Alloy / Temper

Stage 5

Review

Red-Team Trinity → Oracle synthesis

Stage 6

Validate

Hephaestus builds + exercises artifact

Stage 7

Emit

Atlas writes XML + plan dir OR re-loops

Variant	Stage-4 shape	Momus audits
Cast	One Prometheus call — linear single-planner	The single plan
Alloy	N parallel Prometheus-Alloy calls (biased) → Synthesizer blends → one plan	The blend, not the individual variants
Temper	Prometheus-Temper rewrites once per depth; Red-Team runs at every depth; convergence-check.py decides exit	One Momus score per depth (0–100); convergence is deterministic

02 Pipeline Diagrams

Three architectures, side by side

Cast — Linear 9-agent pipeline

Alloy — Tournament at stage 4

Temper — Deepen loop at stage 4

03 The Eight Shared Invariants

Non-negotiable. All three variants. No exceptions.

The commands refuse flags that would break these invariants. Invalid plans are re-looped, not accepted. Refused flags are logged and ignored.

Invariant 01

Red Team Trinity always runs

Security · Scope · Assumptions — three parallel adversaries. No flag disables this. Cast runs red team once at stage 5. Alloy runs it once after synthesis. Temper runs it at every depth.

Invariant 02

Functional validation always runs

Hephaestus builds the real artifact in a scratch worktree, captures real build output, and compares against success criteria. No mocks, no stubs, no test files. Ever. Empty evidence files are invalid.

Invariant 03

Dual output — XML + plan directory

Every successful run produces a {variant}-{run_id}.xml Opus 4.7 semantic-XML prompt and a plan/ markdown directory. Both ship. XML is for machines; the plan directory is for humans.

Invariant 04

Skill enrichment

The probe stage scans ~/.claude/skills/ and the project's .claude/skills/. Matching skills inject automatically — name-based injection against semantic roles.

Invariant 05

Unbounded re-loop on FAIL

Failure folds into the next run's constraints. Cast folds as a Metis directive; Alloy re-runs the tournament via Intent Gate; Temper resets depth=0 and re-runs the deepen loop. Default cap: 3 iterations. --loop lifts to unbounded.

Invariant 06

Parallelization by default

Red Team Trinity fans out (three Task calls in one message). Alloy's N planners fan out via xargs -P. Temper's Red Team fans out inside the deepen loop. Sequential is a fallback, not a design choice.

Invariant 07

Category routing, not model picking

The user specifies --type ultrabrain|deep|quick. The harness maps to a model. Plugins declare category requirements; the runtime resolves them. No hardcoded model names.

Invariant 08

Dual prompts by model family

Agents ship Claude-flavored and GPT-flavored prompts in the same file. The runtime picks at dispatch time. This isolates prompt-format differences from the agent's semantic role.

04 Dispatch Mechanics

Parallelism comes from batching, not background

The Red-Team Trinity fan-out is load-bearing. In a single assistant message, three Task tool calls are emitted — one per adversary. Do NOT set run_in_background: true on any of them. That makes dispatches fire-and-forget and breaks the pipeline: Oracle gets invoked with empty inputs, and the rollup reports 3/3 PASS with no actual review.

The common mistake: setting run_in_background: true on red-team dispatches, thinking it improves parallelism. It doesn't. The Task tool already executes multiple calls in one message concurrently. The guard against this is the emission gate's simultaneous_pass check — which fails if envelopes are empty.

pseudocodecorrect dispatch pattern

# CORRECT — three synchronous Task calls in one message
# Runtime executes them concurrently
[single assistant message]:
  Task(redteam-security, plan_N)   # run_in_background: false (default)
  Task(redteam-scope, plan_N)      # run_in_background: false
  Task(redteam-assumptions, plan_N)# run_in_background: false

# WRONG — fires and forgets; Oracle has no input
[single assistant message]:
  Task(redteam-security, plan_N,   run_in_background: true)  # ← breaks pipeline
  Task(redteam-scope, plan_N,      run_in_background: true)
  Task(redteam-assumptions, plan_N,run_in_background: true)

05 Why Only Stage 4 Differs

Identical upstream. Comparable downstream.

Keeping stages 1–3 and 5–7 identical across variants means: if Alloy produces a stronger plan than Cast on the same task, the difference is genuinely in the tournament — not in upstream noise from different probe strategies or downstream noise from different red teams.

It also means users can swap variants without relearning. The command signatures are symmetric. The flags mean the same things. Output directories are structurally identical except for variant-specific additions (variants/ for Alloy, depth-history.json for Temper).

And it means new variants can be added at stage 4 alone — a hypothetical anneal-foundry that fine-tunes a local model on the task shape would slot in at stage 4 with no changes to stages 1–3 or 5–7. This is the same philosophy that makes Unix pipelines composable.

Three architectures. One contract. The difference is stage 4.anneal · shared-contract

06 Variant Selection Guide

Which variant for which problem?

v0.1.0

Cast

Linear · 9 agents · ~4 min

The problem is scoped. The tradeoffs are known. You want depth, not breadth. Bug fixes, feature additions, migrations with a clear happy path.

Read Cast docs →

v0.1.0

Alloy

Tournament · N planners · ~6 min

The plan shape is non-obvious. One planner will miss dimensions. Database choice, auth strategy, greenfield architecture, novel design.

Read Alloy docs →

v0.1.0

Temper

Deepen loop · depth 3 · ~7 min

The target is complex-but-scoped and genuinely improves with iteration. Auth unification, event bus redesign, flaky migrations, retry semantics.

Read Temper docs →

Anneal docs:Getting Started Cast Alloy Temper Architectures Shared Contract Usage Examples Roadmap