Key Concepts

R_eff (Effective Reliability)

Every decision has a computed trust score: R_eff. It's calculated from evidence attached to the decision — test results, measurements, benchmarks, user feedback.

Formula: R_eff = min(effective_score) across all evidence items.

This is the weakest link principle — the decision is only as trustworthy as its weakest piece of evidence. No averaging, no optimistic roll-ups.

  • R_eff >= 0.5 — healthy, decision is trustworthy
  • R_eff < 0.5 — degraded, surfaces in stale scan
  • R_eff < 0.3 — AT RISK, needs immediate attention
  • No evidence — decision is fresh, not degraded (treated as healthy)

Congruence Level (CL)

Not all evidence is equally relevant. A benchmark from the same project (CL3) is more trustworthy than a blog post about a similar stack (CL1). CL penalties reduce the effective score of cross-context evidence:

CL Context Penalty Example
CL3Same context0.0Internal test result
CL2Similar context0.1Decision from a related project (same language)
CL1Different context0.4External documentation, blog post
CL0Opposed context0.9Evidence from a conflicting methodology

CL matters for cross-project recall too. When a decision from another project surfaces during /h-frame, it gets tagged CL2 (same language) or CL1 (different language).

Weakest Link (WLNK)

Every variant in /h-explore must identify its weakest link — the single thing that bounds its quality. This is not a generic "cons" list. It's the specific mechanism that will fail first under stress.

WLNK applies everywhere in haft: R_eff is min (not average), gate decisions use worst-wins (not voting), evidence chains break at the weakest item.

Evidence Decay

Evidence has an optional valid_until date. When evidence expires, its score drops to 0.1 regardless of its original verdict. This pulls R_eff down, making the decision surface as stale.

The intuition: a benchmark from 6 months ago is not as trustworthy as one from last week. Evidence doesn't become false — it becomes weak. 0.1, not 0.0.

Indicator Roles

When characterizing a problem (/h-char), each comparison dimension gets a role:

  • constraint — hard limit, must satisfy. Variants that violate it are eliminated.
  • target — what you're optimizing. 1-3 targets max.
  • observation — monitor but do NOT optimize. This is Anti-Goodhart: when a metric becomes a target, it ceases to be a good metric. Mark things as observation to prevent reward hacking.

Pareto Front

After comparison (/h-compare), variants are plotted on a Pareto front — the set of options where no option is strictly worse than another on all target dimensions.

If variant A is better on latency but worse on cost, and variant B is the reverse — both are on the Pareto front. Neither dominates the other. Your job is to make the trade-off explicitly, not pretend one is objectively "best."

In v6, Pareto computation is constraint-aware: variants that violate any constraint-role indicator are eliminated before dominance comparison. This prevents infeasible options from cluttering the front.

Parity

A comparison is junk if the options weren't evaluated fairly. Parity means: same inputs, same scope, same budget, same measurement procedure for all variants. If you benchmarked Redis on production hardware and Memcached on a laptop — that's not a fair comparison.

Transformer Mandate

From FPF: a system cannot transform itself. The agent that generates options cannot be the sole validator of those options. In practice:

  • The agent generates variants — the human decides
  • The verification gate challenges decisions before recording
  • Measurements without independent verification get CL1 (self-evidence), not CL3

Claims and Predictions

A claim is a structured, falsifiable prediction attached to a decision. It has three components:

  • observable — what to measure ("p99 latency of /api/search")
  • threshold — what counts as success ("< 200ms")
  • verify_after — when to check ("2026-05-15")

Claims are falsifiable by design. When verify_after passes and the claim is unverified, /h-verify scan surfaces it. Verification results attach as evidence, pulling R_eff up (claim held) or down (claim falsified). This closes the loop between prediction and reality.

Projections

The same artifact graph can be rendered for different audiences. A projection is a deterministic transformation — same input always produces the same output, just filtered and formatted for a specific reader:

  • engineer — invariants, affected files, drift status, WLNK
  • manager — decision titles, health summary, stale count, coverage
  • audit — full evidence chain, CL tags, dates, supersession history
  • compare — side-by-side variant comparison with Pareto front

Projections are computed, not stored. There's one source of truth (the artifact graph); projections are views. Use /h-view to select a projection.

The two cycles

Decisions don't end at recording. The observation cycle (drift, evidence decay, claim verification) feeds signals back into the decision cycle (frame, explore, compare, decide). Failed measurements create new problems. Stale decisions trigger re-evaluation. See Decision Lifecycle for how haft supports each stage.

Next