FeatBit // release decision agent
Release Decision Agent
A new kind of tool that only became possible with large language models — an AI agent that guides any developer through rigorous release decisions, without a data team or experiment platform.
TL;DR
- Every release is a hypothesis, not a plan. Writing the hypothesis before building is what separates evidence-backed releases from ship-and-pray.
- Feature flags are the control plane for every release decision — they make experiments reversible, exposure controllable, and rollback instant.
- Bayesian A/B testing gives builders a structured way to interpret results without a dedicated data team — even with small samples, when effect sizes are large.
- Every experiment ends with an explicit CONTINUE, PAUSE, or ROLLBACK decision — and a written learning that becomes the input to the next cycle.
- • You release frequently and need a repeatable decision structure
- • You want to learn from every release, not just measure deployments
- • Your team has the discipline to write hypotheses before building
- • You can track a meaningful conversion metric end-to-end
- • Rollback capability matters — business risk is real
- • You release once a quarter and changes are largely uncontroversial
- • Traffic is too low to reach statistical significance on any metric
- • There is no agreed north-star metric for the team to align on
- • Speed is the only objective and reversibility has no value
- • Your organization cannot act on what the evidence says
A new category
This did not exist before large language models.
Before LLMs, running a product experiment required a dedicated stack: an experiment platform (Optimizely, Statsig, or PostHog), an analytics engineer to instrument events, a data analyst to interpret results, and a product manager to translate findings into the next decision. Teams that could not afford that stack shipped features without evidence and learned nothing.
The 10× code velocity that LLMs enable makes this problem worse, not better. Teams ship faster than ever — but without a faster feedback loop, shipping faster just means accumulating wrong assumptions faster. The bottleneck moved from writing code to knowing whether code worked.
FeatBit Release Decision Agent is our answer to that bottleneck. It is not a dashboard. It is not a data platform. It is an AI agent — built on Claude Code skills — that guides any developer through the full release decision loop: forming a hypothesis, gating a change behind a feature flag, defining what to measure, collecting evidence, making a CONTINUE / PAUSE / ROLLBACK decision, and closing with a written learning that seeds the next iteration.
The result is that a solo developer or a two-person team can now run rigorous, Bayesian-grounded experiments on their own product — the same process that enterprise teams at Booking.com and Airbnb run with dedicated experimentation infrastructure. The agent replaces the data team, not the rigor.
This is a new category of software infrastructure. It exists because LLMs made it possible. FeatBit is building it.
This framework runs as an AI agent — not a platform you configure
The release decision loop described here is implemented as a set of Claude Code agent skills in featbit/featbit-release-decision-agent. Instead of a dashboard you need a data team to interpret, an AI agent guides any developer through CF-01 to CF-08 — from intent to learning — using FeatBit feature flags and Bayesian analysis. The goal: make rigorous experimentation accessible at 10× code velocity, replacing the need for dedicated experiment vendors like Optimizely, Statsig, or PostHog.
The Four Pillars of a Release Decision Engine
Every release that runs through this framework touches all four pillars. Skip one and the loop breaks.
Hypothesis
Every release starts with a falsifiable hypothesis.
Reversibility
Feature flags make every release reversible in seconds.
Measurement
One north-star metric per experiment, defined before coding begins.
Decision
CONTINUE, PAUSE, or ROLLBACK — evidence-backed, time-boxed, explicit.
10 Deep Dives: Framework, Evidence, and Practice
Start with the dogfooding case study to see the full loop in action, then read the framework articles in any order.
How We Increased Enterprise Inquiries by 210% in 2 Weeks
A complete dogfooding case study: FeatBit used its own feature flags and Bayesian A/B testing to validate a homepage change that tripled enterprise contact clicks with 99.8% confidence.
The Release Decision Loop: From Intent to Learning
The eight-stage loop that turns every release into a structured decision: Intent → Hypothesis → Reversibility → Exposure → Measurement → Evidence → Decision → Learning.
Write the Hypothesis Before You Build
The discipline that separates evidence-backed releases from opinion-driven ship-and-pray. A reusable template and worked examples for writing falsifiable hypotheses.
Feature Flags Are Release Decision Infrastructure
Why feature flags are not just toggles — they are the control plane for reversibility, targeting, exposure control, evidence collection, and rollback.
Bayesian A/B Testing Without a Data Team
Beta-Binomial explained for builders: how to read results, choose confidence thresholds, and make decisions with small samples when the effect size is large.
What to Measure When You Can't Measure Everything
One north-star metric per experiment, defined before coding begins. How to handle attribution gaps, design trackable events, and select proxy metrics.
Reversible by Default: How Feature Flags Prevent Release Disasters
Every release is a bet. Feature flags make bets reversible. A framework for treating ROLLBACK as a first-class decision — not a failure.
Progressive Rollout Patterns for Production Safety
Five rollout patterns — internal-first, canary, percentage-based, segment-targeted, time-gated — with decision criteria for when to advance, pause, or abandon.
Release Decisions in the AI Era
How AI features change the release decision loop: faster iteration cycles, qualitative signals, safety guardrails for LLMs, and feature flags for agentic workflows.
Why Dogfooding Release Decisions Changes How You Build
Using your own tooling changes what you prioritize, how you document, and how fast you iterate. The credibility advantage and feedback loop that come from eating your own cooking.
Run your first AI-guided release decision
Start with the dogfooding case study to see the full agent loop in action, or install the skills from featbit-release-decision-agent and run your first experiment today.