Key Concepts
R_eff (Effective Reliability)
Every decision has a computed trust score: R_eff. It's calculated from evidence attached to the decision — test results, measurements, benchmarks, user feedback.
Formula: R_eff = min(effective_score) across all evidence items.
This is the weakest link principle — the decision is only as trustworthy as its weakest piece of evidence. No averaging, no optimistic roll-ups.
- R_eff >= 0.5 — healthy, decision is trustworthy
- R_eff < 0.5 — degraded, surfaces in stale scan
- R_eff < 0.3 — AT RISK, needs immediate attention
- No evidence — decision is fresh, not degraded (treated as healthy)
Congruence Level (CL)
Not all evidence is equally relevant. A benchmark from the same project (CL3) is more trustworthy than a blog post about a similar stack (CL1). CL penalties reduce the effective score of cross-context evidence:
| CL | Context | Penalty | Example |
|---|---|---|---|
| CL3 | Same context | 0.0 | Internal test result |
| CL2 | Similar context | 0.1 | Decision from a related project (same language) |
| CL1 | Different context | 0.4 | External documentation, blog post |
| CL0 | Opposed context | 0.9 | Evidence from a conflicting methodology |
CL matters for cross-project recall too. When a decision from another project surfaces during
/h-frame, it gets tagged CL2 (same language) or CL1 (different language).
Weakest Link (WLNK)
Every variant in /h-explore must identify its weakest link — the single thing that
bounds its quality. This is not a generic "cons" list. It's the specific mechanism that will
fail first under stress.
WLNK applies everywhere in haft: R_eff is min (not average), gate decisions use worst-wins (not voting), evidence chains break at the weakest item.
Evidence Decay
Evidence has an optional valid_until date. When evidence expires, its score drops
to 0.1 regardless of its original verdict. This pulls R_eff down, making the decision surface
as stale.
The intuition: a benchmark from 6 months ago is not as trustworthy as one from last week. Evidence doesn't become false — it becomes weak. 0.1, not 0.0.
Indicator Roles
When characterizing a problem (/h-char), each comparison dimension gets a role:
- constraint — hard limit, must satisfy. Variants that violate it are eliminated.
- target — what you're optimizing. 1-3 targets max.
- observation — monitor but do NOT optimize. This is Anti-Goodhart: when a metric becomes a target, it ceases to be a good metric. Mark things as observation to prevent reward hacking.
Pareto Front
After comparison (/h-compare), variants are plotted on a Pareto front — the set
of options where no option is strictly worse than another on all target dimensions.
If variant A is better on latency but worse on cost, and variant B is the reverse — both are on the Pareto front. Neither dominates the other. Your job is to make the trade-off explicitly, not pretend one is objectively "best."
In v6, Pareto computation is constraint-aware: variants that violate any
constraint-role indicator are eliminated before dominance comparison. This
prevents infeasible options from cluttering the front.
Parity
A comparison is junk if the options weren't evaluated fairly. Parity means: same inputs, same scope, same budget, same measurement procedure for all variants. If you benchmarked Redis on production hardware and Memcached on a laptop — that's not a fair comparison.
Transformer Mandate
From FPF: a system cannot transform itself. The agent that generates options cannot be the sole validator of those options. In practice:
- The agent generates variants — the human decides
- The verification gate challenges decisions before recording
- Measurements without independent verification get CL1 (self-evidence), not CL3
Claims and Predictions
A claim is a structured, falsifiable prediction attached to a decision. It has three components:
- observable — what to measure ("p99 latency of /api/search")
- threshold — what counts as success ("< 200ms")
- verify_after — when to check ("2026-05-15")
Claims are falsifiable by design. When verify_after passes and the claim
is unverified, /h-verify scan surfaces it. Verification results attach as
evidence, pulling R_eff up (claim held) or down (claim falsified). This closes the loop
between prediction and reality.
Projections
The same artifact graph can be rendered for different audiences. A projection is a deterministic transformation — same input always produces the same output, just filtered and formatted for a specific reader:
- engineer — invariants, affected files, drift status, WLNK
- manager — decision titles, health summary, stale count, coverage
- audit — full evidence chain, CL tags, dates, supersession history
- compare — side-by-side variant comparison with Pareto front
Projections are computed, not stored. There's one source of truth (the artifact graph);
projections are views. Use /h-view to select a projection.
The two cycles
Decisions don't end at recording. The observation cycle (drift, evidence decay, claim verification) feeds signals back into the decision cycle (frame, explore, compare, decide). Failed measurements create new problems. Stale decisions trigger re-evaluation. See Decision Lifecycle for how haft supports each stage.
Next
- FAQ — common questions about haft
- Decision lifecycle — how decisions age and refresh