Release Decision Engine/210% Case Study

How We Increased Enterprise Inquiries by 210% in 2 Weeks

A complete dogfooding case study: FeatBit used its own feature flags and Bayesian A/B testing to validate a homepage UI change that tripled enterprise contact clicks — with 99.8% probability that the result was not due to chance.

11 min read·Updated March 2026

VisualReading

TL;DR

▸Question: Should the hero show a deployment-method selector (Kubernetes / AWS / Docker Compose)?
▸Hypothesis: Showing the selector below the hero CTAs will increase enterprise Contact Sales clicks because it removes the discovery gap for self-hosting-intent visitors.
▸Flag: homepage-deployment-selector, boolean, 50/50 split, 14-day window.
▸Result: Control 9.7% (6/62) → Treatment 30.0% (18/60). Relative lift: +210%. P(treatment wins): 99.8%.
▸Decision: CONTINUE — rolled flag to 100% traffic.

The Question That Started It

It started as a simple product question: “Should I add buttons for deploying to Kubernetes and AWS in the hero section?” The intuition was that most FeatBit paid customers self-host — on K8s, on AWS, on Docker — and the hero was not signaling that.

But the release decision framework forced a more useful question: what business outcome does this change affect? Adding buttons is a tactic. The real question is whether exposing deployment paths at the hero increases enterprise evaluation intent — specifically, whether visitors with self-hosting intent are more likely to initiate a sales conversation.

That reframing changed everything. Instead of shipping buttons and assuming they helped, we wrote a hypothesis, chose a measurable outcome, built the feature behind a flag, and ran a 50/50 experiment. The answer came back definitive.

Intent to Hypothesis

The framework walks through three stages before any code is written.

1. Intent clarification

The business goal is to increase the rate at which homepage visitors become paid self-hosting customers. The tactic (“add deploy buttons”) was removed from the framing. The question became: does the hero undersell self-hosting intent?

2. Hypothesis

Written using the template: change / metric / audience / causal reason / success criteria.

We believe adding a deployment-method selector (Kubernetes / AWS / Docker Compose) below the hero CTA buttons will increase enterprise_contact_click rate among homepage visitors with self-hosting intent, because the current hero presents FeatBit as a cost-reduction tool without signaling production K8s and AWS readiness. We will know this worked if the treatment group shows a statistically significant increase in enterprise_contact_click rate with P(treatment wins) ≥ 95%.

3. Reversibility via feature flag

Before writing a line of UI code, the decision was made: this change is gated by a boolean flag. If the hypothesis is refuted, or if a guardrail metric degrades, rolling back means toggling the flag — not reverting commits or deploying a hotfix.

The Feature Flag

The flag gated the entire selector component. No backend changes were required. Rollback was one click in the FeatBit dashboard.

Flag configuration

Keyhomepage-deployment-selector

TypeBoolean

Control (false)Original hero, no selector

Treatment (true)Hero + deployment selector row

Split50% / 50%

TargetingAll homepage visitors

Duration14 days

What each variant showed

Control (false)

The original hero with headline, sub-copy, and two CTA buttons. No deployment-method signal. Visitors had to discover K8s / AWS support on their own.

Treatment (true)

Same hero plus a "Self-host on: [Kubernetes] [AWS] [Docker Compose]" selector row directly below the CTAs. Each option linked to the relevant installation guide.

Measurement Design

Two measurement decisions shaped the experiment before any data was collected.

Decision 1: North-star metric = enterprise_contact_click

The obvious candidate — tracking clicks on the deployment selector itself — was rejected. Selector clicks measure curiosity, not intent. The business goal is paid self-hosting customers, and the closest trackable proxy is a visitor who reaches the pricing page and clicks "Contact Sales" on the Enterprise Standard License plan.

That event — enterprise_contact_click — fires on the pricing page when a visitor clicks the enterprise contact button. It measures downstream intent that is directly correlated with the business outcome, not just engagement with the treatment element.

Decision 2: Cross-page attribution via sessionStorage

The variant is assigned on the homepage, but the conversion event fires on the pricing page. Without bridging the variant across pages, the data would be uninterpretable.

Solution: on homepage load, the variant value is written to sessionStorage ashero_variant. On the pricing page, the contact click handler reads this value and includes it as an event property. The analytics pipeline then filters enterprise_contact_click events by properties.variant to separate control from treatment.

Results: 99.8% Confidence

After 14 days, the results were unambiguous.

Variant	Sessions (n)	Conversions (k)	CTR	vs Control
Control (false)	62	6	9.7%	baseline
Treatment (true)	60	18	30.0%	+210%

Bayesian result

99.8%

P(treatment wins)

+210%

Relative lift

In practical terms: 99.8% probability that treatment wins means that if we ran this experiment 1,000 times, treatment would beat control in 998 of them. The Beta-Binomial model places this level of certainty even with 60–62 sessions per variant because the effect size is so large — treatment converts at more than 3× the rate of control.

The sample (60 per variant) was below our 200-per-variant minimum for a precise magnitude estimate. But a 3× conversion difference is sufficiently robust that doubling or tripling the sample is unlikely to reverse the directional conclusion. We treat +210% as indicative, not precise — the direction is reliable.

The CONTINUE Decision

The framework offers four decision categories. With 99.8% confidence and +210% lift on the north-star metric, and no guardrail degradation, the category was clear.

CONTINUE

Primary metric positive, guardrails healthy

PAUSE

Guardrail degraded or signal is mixed

ROLLBACK

Evidence of harm, revert immediately

CONTINUE operationally meant:

1.Toggle homepage-deployment-selector flag to 100% traffic (true for all visitors).
2.Archive the experiment — no more data collection, observation window closed.
3.Write the learning artifact: what changed, what happened, confirmed or refuted, why, next hypothesis.
4.Update intent.md with last_learning and reset stage to intent for the next cycle.
5.Form the next hypothesis: if a navigation signal drove this lift, does the headline narrative also need to change?

What Comes Next

The deployment selector fixed a navigation gap — visitors no longer had to discover K8s and AWS support on their own. But the hero narrative still leads with cost messaging: "Cut costs by 20x." The +210% lift suggests self-hosting-intent visitors are highly responsive to even small signals of production readiness. If a four-word selector row drives this much lift, what happens if the headline itself leads with self-hosting conviction?

Next candidate hypothesis

We believe changing the hero headline from cost-reduction messaging to self-hosting credibility ("Trusted in production on K8s and AWS by 50+ companies") will increase enterprise_contact_click rate beyond the current 30.0% baseline, because it addresses the trust gap — not just the navigation gap — for visitors evaluating self-hosted feature flag infrastructure.

This hypothesis feeds back into CF-01 (Intent) and restarts the loop. The baseline is no longer 9.7% — it is 30.0%. Every future experiment starts from evidence, not opinion.

This experiment was run using the FeatBit Release Decision Agent

The entire loop — intent shaping, hypothesis design, flag spec, event tracking, Bayesian analysis, and learning closure — was guided by Claude Code agent skills defined in featbit/featbit-release-decision-agent. The agent replaces the need for a dedicated experiment platform or data team.

featbit/featbit-release-decision-agent

Continue reading

The Release Decision Loop Bayesian A/B Testing Without a Data Team What to Measure When You Can't Measure Everything Why Dogfooding Changes How You Build Back to Release Decision Engine hub