Release Decision AgentEvidence-Backed ReleasesFeatBit Dogfooding

FeatBit // release decision agent

Release Decision Agent

A new kind of tool that only became possible with large language models — an AI agent that guides any developer through rigorous release decisions, without a data team or experiment platform.

TL;DR

Every release is a hypothesis, not a plan. Writing the hypothesis before building is what separates evidence-backed releases from ship-and-pray.
Feature flags are the control plane for every release decision — they make experiments reversible, exposure controllable, and rollback instant.
Bayesian A/B testing gives builders a structured way to interpret results without a dedicated data team — even with small samples, when effect sizes are large.
Every experiment ends with an explicit CONTINUE, PAUSE, or ROLLBACK decision — and a written learning that becomes the input to the next cycle.

Works best when…

• You release frequently and need a repeatable decision structure
• You want to learn from every release, not just measure deployments
• Your team has the discipline to write hypotheses before building
• You can track a meaningful conversion metric end-to-end
• Rollback capability matters — business risk is real

Skip it when…

• You release once a quarter and changes are largely uncontroversial
• Traffic is too low to reach statistical significance on any metric
• There is no agreed north-star metric for the team to align on
• Speed is the only objective and reversibility has no value
• Your organization cannot act on what the evidence says

A new category

This did not exist before large language models.

Before LLMs, running a product experiment required a dedicated stack: an experiment platform (Optimizely, Statsig, or PostHog), an analytics engineer to instrument events, a data analyst to interpret results, and a product manager to translate findings into the next decision. Teams that could not afford that stack shipped features without evidence and learned nothing.

The 10× code velocity that LLMs enable makes this problem worse, not better. Teams ship faster than ever — but without a faster feedback loop, shipping faster just means accumulating wrong assumptions faster. The bottleneck moved from writing code to knowing whether code worked.

FeatBit Release Decision Agent is our answer to that bottleneck. It is not a dashboard. It is not a data platform. It is an AI agent — built on Claude Code skills — that guides any developer through the full release decision loop: forming a hypothesis, gating a change behind a feature flag, defining what to measure, collecting evidence, making a CONTINUE / PAUSE / ROLLBACK decision, and closing with a written learning that seeds the next iteration.

The result is that a solo developer or a two-person team can now run rigorous, Bayesian-grounded experiments on their own product — the same process that enterprise teams at Booking.com and Airbnb run with dedicated experimentation infrastructure. The agent replaces the data team, not the rigor.

This is a new category of software infrastructure. It exists because LLMs made it possible. FeatBit is building it.

FeatBit Release Decision AgentClaude Code SkillsOpen Source

This framework runs as an AI agent — not a platform you configure

The release decision loop described here is implemented as a set of Claude Code agent skills in featbit/featbit-release-decision-agent. Instead of a dashboard you need a data team to interpret, an AI agent guides any developer through CF-01 to CF-08 — from intent to learning — using FeatBit feature flags and Bayesian analysis. The goal: make rigorous experimentation accessible at 10× code velocity, replacing the need for dedicated experiment vendors like Optimizely, Statsig, or PostHog.

View on GitHub

The Four Pillars of a Release Decision Engine

Every release that runs through this framework touches all four pillars. Skip one and the loop breaks.

Hypothesis

Every release starts with a falsifiable hypothesis.

Reversibility

Feature flags make every release reversible in seconds.

Measurement

One north-star metric per experiment, defined before coding begins.

Decision

CONTINUE, PAUSE, or ROLLBACK — evidence-backed, time-boxed, explicit.

10 Deep Dives: Framework, Evidence, and Practice

Start with the dogfooding case study to see the full loop in action, then read the framework articles in any order.

0111 min read

How We Increased Enterprise Inquiries by 210% in 2 Weeks

A complete dogfooding case study: FeatBit used its own feature flags and Bayesian A/B testing to validate a homepage change that tripled enterprise contact clicks with 99.8% confidence.

Case StudyBayesianDogfooding

028 min read

The Release Decision Loop: From Intent to Learning

The eight-stage loop that turns every release into a structured decision: Intent → Hypothesis → Reversibility → Exposure → Measurement → Evidence → Decision → Learning.

FrameworkLoopProcess

037 min read

Write the Hypothesis Before You Build

The discipline that separates evidence-backed releases from opinion-driven ship-and-pray. A reusable template and worked examples for writing falsifiable hypotheses.

HypothesisDisciplineBefore You Ship

048 min read

Feature Flags Are Release Decision Infrastructure

Why feature flags are not just toggles — they are the control plane for reversibility, targeting, exposure control, evidence collection, and rollback.

Feature FlagsInfrastructureReversibility

0510 min read

Bayesian A/B Testing Without a Data Team

Beta-Binomial explained for builders: how to read results, choose confidence thresholds, and make decisions with small samples when the effect size is large.

BayesianA/B TestingNo Data Team

068 min read

What to Measure When You Can't Measure Everything

One north-star metric per experiment, defined before coding begins. How to handle attribution gaps, design trackable events, and select proxy metrics.

MeasurementMetricsEvent Design

077 min read

Reversible by Default: How Feature Flags Prevent Release Disasters

Every release is a bet. Feature flags make bets reversible. A framework for treating ROLLBACK as a first-class decision — not a failure.

ReversibilityRiskROLLBACK

089 min read

Progressive Rollout Patterns for Production Safety

Five rollout patterns — internal-first, canary, percentage-based, segment-targeted, time-gated — with decision criteria for when to advance, pause, or abandon.

Progressive RolloutCanaryProduction Safety

098 min read

Release Decisions in the AI Era

How AI features change the release decision loop: faster iteration cycles, qualitative signals, safety guardrails for LLMs, and feature flags for agentic workflows.

AI FeaturesLLMSafety Guardrails

107 min read

Why Dogfooding Release Decisions Changes How You Build

Using your own tooling changes what you prioritize, how you document, and how fast you iterate. The credibility advantage and feedback loop that come from eating your own cooking.

DogfoodingProduct CultureIteration

119 min read

Release Decision Harness: Democratizing Experimentation for the AI Era

Why the featbit-release-decision-agent is an LLM agent harness, not just an agent — and how one hub skill coordinates eight satellite skills through a shared decision state file to scaffold the full loop from intent to learning.

LLM Agent HarnessArchitectureSKILL.md

Run your first AI-guided release decision

Start with the dogfooding case study to see the full agent loop in action, or install the skills from featbit-release-decision-agent and run your first experiment today.

Try FeatBit Free Read the Case Study