Release Decision Engine/The Framework

The Release Decision Loop: From Intent to Learning

Eight stages. Each stage produces a concrete artifact. Each artifact feeds the next stage. The loop closes when learning becomes the input to the next intent.

8 min read·Updated March 2026

VisualReading

TL;DR

▸The release decision loop has eight stages: Intent → Hypothesis → Reversibility → Exposure → Measurement → Evidence → Decision → Learning.
▸Each stage produces an artifact — intent statement, hypothesis, flag config, metric definition, data, decision record, learning.
▸The loop only closes when learning becomes input to the next intent. Without CF-08, you are accumulating releases, not knowledge.
▸The framework is lightweight by design — one intent.md file and a conventions document are enough to run it.

The Loop at a Glance

Stage	Question answered	Artifact produced
CF-01 Intent	What business outcome are we trying to change?	Intent statement in intent.md
CF-02 Hypothesis	What change will move that outcome, and why?	Falsifiable hypothesis with success criteria
CF-03 Reversibility	How do we roll back if the hypothesis is wrong?	Feature flag definition
CF-04 Exposure	Who sees the change, how much traffic, when?	Rollout config (split, targeting, duration)
CF-05 Measurement	What is the one metric that tells us if it worked?	Primary metric + guardrails + event spec
CF-06 Evidence	Is the data sufficient to decide?	Evidence sufficiency check
CF-07 Decision	CONTINUE, PAUSE, or ROLLBACK?	Decision record with evidence summary
CF-08 Learning	What did we learn, and what should we test next?	Learning artifact + next hypothesis

CF-01: Intent — What Are We Trying to Change?

Intent is not a feature request. It is a business outcome: 'increase the rate at which homepage visitors become paid self-hosting customers.' The tactical idea — add deploy buttons — is removed. What remains is the measurable outcome the team is trying to move. Intent clarification prevents the loop from starting with a solution rather than a problem.

CF-02: Hypothesis — What Will Move It, and Why?

A hypothesis is a causal claim: 'We believe X will cause Y among Z because of W.' It is falsifiable — a clear prediction with a success criterion. Without a falsifiable hypothesis, you cannot know whether the experiment succeeded, failed, or was inconclusive. The hypothesis is written before any code is touched.

CF-03: Reversibility — How Do We Roll Back?

Every change that enters this loop is gated by a feature flag. Not because every change is risky, but because the flag is what makes the experiment controllable. Reversibility is not just about rollback — it is the structural guarantee that allows you to run experiments in production without existential risk.

CF-04: Exposure — Who Sees It, How Much?

Exposure design answers three questions: which users are eligible (targeting), what fraction sees the treatment (rollout percentage), and how long the observation window runs (duration). These three parameters determine whether the experiment will collect enough data to decide, and whether the data will be interpretable.

CF-05: Measurement — What Is the One Metric?

One north-star metric per experiment. Defined before the flag is turned on. The metric must be (a) directly tied to the business outcome from CF-01, (b) trackable across the conversion path, and (c) meaningful at the expected sample size. Guardrail metrics are also defined — these are signals that must not degrade while the primary metric improves.

CF-06/07: Evidence and Decision

CF-06 asks whether the data is sufficient to decide. CF-07 frames the outcome into one of four categories: CONTINUE (primary metric positive, guardrails healthy), PAUSE (mixed signal, investigate), ROLLBACK CANDIDATE (evidence of harm), or INCONCLUSIVE (insufficient data). The decision record documents the evidence and the explicit next action.

CF-08: Learning — What Did We Learn, What Next?

A cycle is not closed until a learning is written. The learning captures five things: what changed, what happened (with numbers), confirmed or refuted, why it likely happened, and the next hypothesis. A learning that is missing the 'why' or 'next hypothesis' is incomplete — it breaks the loop.

FAQ

Does every release need to go through all eight stages?

No. The framework is most valuable for changes where the outcome is uncertain and the stakes are meaningful. Trivial changes (copy fixes, obviously-correct bug fixes) don't require a full loop. The threshold is: would you want to know if this change made things worse?

What if we don't have enough traffic to run an experiment?

Low traffic is a genuine constraint. The loop still applies — you write the hypothesis, gate behind a flag, and collect data over a longer window. If the window is too long to be practical, the learning becomes: 'our traffic volume makes experimentation on this metric impractical; we need a proxy metric or a different approach.'

How does the loop handle changes that can't be A/B tested?

Not every change needs a 50/50 split. The loop supports progressive rollouts, internal-first exposure, and targeted segments. The key is that a decision criteria exists before the flag is turned on — not a post-hoc judgment.

Continue reading

See the loop in action: 210% case study Write the Hypothesis Before You Build What to Measure How the harness implements this loop Back to hub