How We Increased Enterprise Inquiries by 210% in 2 Weeks
A complete dogfooding case study: FeatBit used its own feature flags and Bayesian A/B testing to validate a homepage UI change that tripled enterprise contact clicks — with 99.8% probability that the result was not due to chance.
TL;DR
- ▸Question: Should the hero show a deployment-method selector (Kubernetes / AWS / Docker Compose)?
- ▸Hypothesis: Showing the selector below the hero CTAs will increase enterprise Contact Sales clicks because it removes the discovery gap for self-hosting-intent visitors.
- ▸Flag:
homepage-deployment-selector, boolean, 50/50 split, 14-day window. - ▸Result: Control 9.7% (6/62) → Treatment 30.0% (18/60). Relative lift: +210%. P(treatment wins): 99.8%.
- ▸Decision: CONTINUE — rolled flag to 100% traffic.
The Question That Started It
It started as a simple product question: “Should I add buttons for deploying to Kubernetes and AWS in the hero section?” The intuition was that most FeatBit paid customers self-host — on K8s, on AWS, on Docker — and the hero was not signaling that.
But the release decision framework forced a more useful question: what business outcome does this change affect? Adding buttons is a tactic. The real question is whether exposing deployment paths at the hero increases enterprise evaluation intent — specifically, whether visitors with self-hosting intent are more likely to initiate a sales conversation.
That reframing changed everything. Instead of shipping buttons and assuming they helped, we wrote a hypothesis, chose a measurable outcome, built the feature behind a flag, and ran a 50/50 experiment. The answer came back definitive.
Intent to Hypothesis
The framework walks through three stages before any code is written.
1. Intent clarification
The business goal is to increase the rate at which homepage visitors become paid self-hosting customers. The tactic (“add deploy buttons”) was removed from the framing. The question became: does the hero undersell self-hosting intent?
2. Hypothesis
Written using the template: change / metric / audience / causal reason / success criteria.
3. Reversibility via feature flag
Before writing a line of UI code, the decision was made: this change is gated by a boolean flag. If the hypothesis is refuted, or if a guardrail metric degrades, rolling back means toggling the flag — not reverting commits or deploying a hotfix.
The Feature Flag
The flag gated the entire selector component. No backend changes were required. Rollback was one click in the FeatBit dashboard.
Flag configuration
What each variant showed
The original hero with headline, sub-copy, and two CTA buttons. No deployment-method signal. Visitors had to discover K8s / AWS support on their own.
Same hero plus a "Self-host on: [Kubernetes] [AWS] [Docker Compose]" selector row directly below the CTAs. Each option linked to the relevant installation guide.
Measurement Design
Two measurement decisions shaped the experiment before any data was collected.
Decision 1: North-star metric = enterprise_contact_click
The obvious candidate — tracking clicks on the deployment selector itself — was rejected. Selector clicks measure curiosity, not intent. The business goal is paid self-hosting customers, and the closest trackable proxy is a visitor who reaches the pricing page and clicks "Contact Sales" on the Enterprise Standard License plan.
That event — enterprise_contact_click — fires on the pricing page when a visitor clicks the enterprise contact button. It measures downstream intent that is directly correlated with the business outcome, not just engagement with the treatment element.
Decision 2: Cross-page attribution via sessionStorage
The variant is assigned on the homepage, but the conversion event fires on the pricing page. Without bridging the variant across pages, the data would be uninterpretable.
Solution: on homepage load, the variant value is written to sessionStorage ashero_variant. On the pricing page, the contact click handler reads this value and includes it as an event property. The analytics pipeline then filters enterprise_contact_click events by properties.variant to separate control from treatment.
Results: 99.8% Confidence
After 14 days, the results were unambiguous.
| Variant | Sessions (n) | Conversions (k) | CTR | vs Control |
|---|---|---|---|---|
| Control (false) | 62 | 6 | 9.7% | baseline |
| Treatment (true) | 60 | 18 | 30.0% | +210% |
In practical terms: 99.8% probability that treatment wins means that if we ran this experiment 1,000 times, treatment would beat control in 998 of them. The Beta-Binomial model places this level of certainty even with 60–62 sessions per variant because the effect size is so large — treatment converts at more than 3× the rate of control.
The sample (60 per variant) was below our 200-per-variant minimum for a precise magnitude estimate. But a 3× conversion difference is sufficiently robust that doubling or tripling the sample is unlikely to reverse the directional conclusion. We treat +210% as indicative, not precise — the direction is reliable.
The CONTINUE Decision
The framework offers four decision categories. With 99.8% confidence and +210% lift on the north-star metric, and no guardrail degradation, the category was clear.
CONTINUE operationally meant:
- 1.Toggle homepage-deployment-selector flag to 100% traffic (true for all visitors).
- 2.Archive the experiment — no more data collection, observation window closed.
- 3.Write the learning artifact: what changed, what happened, confirmed or refuted, why, next hypothesis.
- 4.Update intent.md with last_learning and reset stage to intent for the next cycle.
- 5.Form the next hypothesis: if a navigation signal drove this lift, does the headline narrative also need to change?
What Comes Next
The deployment selector fixed a navigation gap — visitors no longer had to discover K8s and AWS support on their own. But the hero narrative still leads with cost messaging: "Cut costs by 20x." The +210% lift suggests self-hosting-intent visitors are highly responsive to even small signals of production readiness. If a four-word selector row drives this much lift, what happens if the headline itself leads with self-hosting conviction?
Next candidate hypothesis
This hypothesis feeds back into CF-01 (Intent) and restarts the loop. The baseline is no longer 9.7% — it is 30.0%. Every future experiment starts from evidence, not opinion.
The entire loop — intent shaping, hypothesis design, flag spec, event tracking, Bayesian analysis, and learning closure — was guided by Claude Code agent skills defined in featbit/featbit-release-decision-agent. The agent replaces the need for a dedicated experiment platform or data team.
featbit/featbit-release-decision-agent