Temper — Fixed-Point Deepen
One plan. Heated and cooled repeatedly. The Red-Team Trinity attacks it at every depth, Momus scores it 0–100, and a deterministic Python script decides when to stop. The loop exits when the numbers say stop — not when the LLM says stop.
The deepen loop in detail
Depth 0 is the seed. Every subsequent depth gets the prior plan, the full red-team envelopes, and Momus's score. Every rewrite is a full rewrite — not a patch.
Seed (depth 0):
1. Prometheus-Temper writes plan_0 from (Metis directives, probe report)
2. Red-Team Trinity attacks plan_0 in parallel
3. Momus scores plan_0 → score_0
4. convergence-check.py(depth=0, scores=[score_0], cap=N) → continue or exit
Iteration (depth N, N ≥ 1):
1. Prometheus-Temper rewrites plan_{N-1} with:
(plan_{N-1}, momus_envelope_{N-1}, redteam_envelopes_{N-1},
metis_directives, depth_scores)
2. Red-Team Trinity attacks plan_N in parallel
3. Momus scores plan_N → score_N
4. convergence-check.py(depth=N, scores=depth_scores, cap=N) → exit?
Exit: plan_final = plan_N where loop exitedThree rules. First one wins.
The loop exits when any one of these is true. The script is the spec — not the LLM, not an implicit judgment.
- Rule 1 — Variance of top-3 depth scores < 0.3Scores have stabilized. The last three rewrites are producing substantively the same plan, and further iteration won't change that.
depth_scores = [72, 81, 87, 86, 87] top_3_scores = [87, 87, 86] variance = 0.22 → < 0.3 → CONVERGED
- Rule 2 — |Δ score| < 0.15 across last 2 depthsMarginal improvement, diminishing returns. The plan is still changing but the changes are small enough that further depth won't produce qualitatively better output.
depth_scores = [65, 78, 83, 84.1, 84.0] delta = |84.0 - 84.1| = 0.1 → < 0.15 → CONVERGED
- Rule 3 — depth == hard_capRunaway iteration guard. Default cap is 3; user-configurable 1–5. This rule guarantees no run spins forever.
depth_scores = [45, 52, 58, 61, 63] depth = 5, cap = 5 → CONVERGED (cap reached)
How deep do you need to go?
| Depth | Spawns | Wall-clock | Use when |
|---|---|---|---|
| 1 | ~8 | ~3 min | You want the deepen discipline but trust the seed |
| 2 | ~16 | ~5 min | Plan has one known weakness the seed won't catch |
| 3 (default) | ~24 | ~7 min | Typical complex-but-scoped task |
| 4 | ~32 | ~10 min | Plan requires three substantive rewrites |
| 5 | ~40 | ~13 min | Really melt the rock |
The 0–100 quality estimate
Momus's score is not a sum of findings — it's a direct quality estimate: "would I, as a senior engineer, sign off on this plan for implementation right now?"
| Range | Verdict | Meaning |
|---|---|---|
| 100 | SAFE | Ship it now — no remaining concerns |
| 85–99 | SAFE | All major gaps closed, only minor polish left |
| 70–84 | CAUTION | Non-blocking concerns, plan is implementable |
| 50–69 | RISKY | Significant gaps, human review required |
| 0–49 | BLOCK | Plan is not implementable as written |
95 — Every phase has a measurable gate. Hephaestus has a buildable target at phase-0. I'd ship it.
75 — Correct and implementable but error-path handling is thin. Risk is manageable.
55 — Structural gaps — a named service isn't declared as a dependency. Needs a rewrite.
35 — Plan contradicts itself (phase-2 removes a file phase-4 edits). Not implementable.
Three real deepen runs
Example 1 — Unify OIDC and legacy JWT auth
Auth unification is a canonical deepen-friendly task. Pass 1 catches the happy path. Pass 2 catches the migration envelope. Pass 3 catches clock skew and token refresh edge cases — things a single planner pass never surfaces.
/anneal-temper:anneal --depth 3 "Rewrite the auth middleware to unify OIDC and legacy JWT flows. Both must continue to work during a 90-day transition. The unified middleware must expose a single Resolver interface."
Why not Cast: Cast produced a 65-score plan on the same task — the skew-tolerance gap was not visible to a single planner pass. Temper's depth-1 rewrite surfaced it via Red-Team-Security; depth-2 fixed it.
Example 2 — Redesign the event bus
The problem is bounded (event bus, not general messaging) but the design space is rich — Kafka vs NATS vs Redis Streams vs custom Postgres-backed log. At depth 5, the pivots at depth 1 and depth 3 would never happen under Cast's single-pass discipline.
/anneal-temper:anneal --depth 5 "Redesign the event bus: currently a mix of Postgres LISTEN/NOTIFY, Redis pub/sub, and in-process EventEmitter. Consolidate into one bus: durable delivery, ordered per-aggregate, replay from arbitrary offset, cross-service fanout."
Example 3 — Replace a flaky migration
The task is narrow. Depth 1 covers obvious phased migration. Depth 2 catches concurrent inserts during backfill, trigger interactions, and replica lag — the exactly the "weird edge cases" that cost an incident.
/anneal-temper:anneal --depth 2 "The 2025-08-12 migration that added tenant_id locks tables for ~4 minutes in production. Rewrite the migration to complete with zero downtime. Existing data must be preserved."
Right tool, right problem
depth-history.json).What Temper writes to disk
| File | What's in it |
|---|---|
depth-history.json | Per-depth plans, envelopes, scores, convergence check output — diff-able |
plan-depth-0.md | Raw seed plan (preserved, not deleted) |
plan-depth-1.md, plan-depth-2.md… | Raw intermediate plans per depth |
reviews/momus-envelope-depth-N.yaml | One Momus envelope per depth with score 0-100 |
reviews/redteam-*-envelope-depth-N.yaml | Three red-team envelopes per depth |
plan/plan.md | Final blended plan overview ≤80 lines |
plan/phase-NN-*.md | Final phase files with success criteria |
{variant}-{run_id}.xml | Opus 4.7 semantic-XML prompt for one-shot execution |
rollup.yaml | convergence_reason, depth_scores, simultaneous_pass, emission decision |
depth-history.json is designed to be diff-able. Pipe consecutive depth plans through diff -u to see exactly what each rewrite changed.