
Launchdarkly Experiment Setup
Design and operate LaunchDarkly experiments—metrics, treatments, flag config, iterations, and winners—via the hosted LaunchDarkly MCP tools.
Overview
LaunchDarkly Experiment Setup is an agent skill most often used in Grow (also Validate, Ship) that creates, runs, and concludes LaunchDarkly experiments through the official MCP server.
Install
npx skills add https://github.com/launchdarkly/agent-skills --skill launchdarkly-experiment-setupWhat is this skill?
- End-to-end experiment setup: hypothesis, metrics, treatments, and flag configuration
- Uses MCP tools `create-experiment`, `start-experiment-iteration`, and `get-experiment` as the required core
- Supports evolving design between iterations and swapping treatments when `mutableFieldsByStatus` allows
- Optional browse and update flows via `list-experiments` and `update-experiment`
- Requires remotely hosted LaunchDarkly MCP server (Apache-2.0 skill, v0.2.0)
- 3 required MCP tools: create-experiment, start-experiment-iteration, get-experiment
- 2 optional MCP tools: list-experiments, update-experiment
- Skill metadata version 0.2.0
Adoption & trust: 1k installs on skills.sh; 16 GitHub stars; 3/3 security scanners passed (skills.sh audits).
What problem does it solve?
You have feature flags but no disciplined way to stand up metrics, treatments, and iterations in LaunchDarkly without missing steps or editing the wrong iteration fields.
Who is it for?
Indie SaaS and product teams already on LaunchDarkly who want the agent to drive experiment lifecycle calls through MCP with correct iteration semantics.
Skip if: Builders without LaunchDarkly MCP configured, offline-only products, or teams that only need static flag rollouts with no measurement plan.
When should I use this skill?
User wants to set up, run, or conclude experiments in LaunchDarkly with metrics, treatments, iterations, and flag config via MCP.
What do I get? / Deliverables
You get a created experiment with a started iteration (or clear status), validated metrics and treatments, and a documented stop decision with a winner when data supports it.
- Experiment definition with hypothesis, metrics, treatments, and flag mapping
- Started or status-checked iteration with observable experiment state
- Stop decision documenting winning treatment when iteration completes
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Grow is the canonical shelf because the skill’s end state is measured iteration, data collection, and picking a winning treatment—not initial feature coding. Analytics subphase fits hypothesis-driven experiments, metric definitions, and iteration outcomes tied to product flags.
Where it fits
Compare two onboarding treatments behind a flag before committing to full build scope.
Start an iteration on launch-day pricing copy variants tied to the same production flag key.
Run a multi-week experiment on checkout layout metrics and stop when one treatment wins clearly.
Iterate activation email CTA treatments between stopped and restarted iterations without breaking flag config.
How it compares
LaunchDarkly MCP workflow for experiments—not generic analytics dashboards or hand-written A/B test code outside the flag platform.
Common Questions / FAQ
Who is launchdarkly-experiment-setup for?
Solo builders and small teams shipping with LaunchDarkly feature flags who want agent-guided experiment creation, iteration control, and winner selection using documented MCP tools.
When should I use launchdarkly-experiment-setup?
Use it during Validate when testing positioning or onboarding variants; in Ship when gating a release behind measured treatments; and in Grow when optimizing conversion, retention, or pricing flows with flag-backed experiments.
Is launchdarkly-experiment-setup safe to install?
It instructs the agent to call remote LaunchDarkly APIs via MCP; review the Security Audits panel on this page, verify MCP credentials scope, and avoid exposing production secrets in chat logs.
SKILL.md
READMESKILL.md - Launchdarkly Experiment Setup
# LaunchDarkly Experiment Setup You're using a skill that guides you through setting up and running experiments in LaunchDarkly. Your job is to design the experiment, create it with the right metrics, treatments, and flag config, start data collection, evolve the design between iterations when needed, and stop with a winner. ## Prerequisites This skill requires the remotely hosted LaunchDarkly MCP server to be configured in your environment. **Required MCP tools:** - `create-experiment` — create a new experiment with its initial iteration (hypothesis, metrics, treatments, flag config). - `start-experiment-iteration` — begin collecting data for an experiment's current draft iteration. - `get-experiment` — check experiment status, treatments, metrics, and current iteration. **Optional MCP tools:** - `list-experiments` — browse existing experiments in the project. - `update-experiment` — update fields on the experiment or its current iteration. Honours `mutableFieldsByStatus`, so what's editable depends on whether the iteration is `not_started`, `running`, or `stopped`. Returns rejected inputs under `skipped`. - `save-and-start-experiment-iteration` — the API-recommended way to change locked fields on a running experiment. Stops the current iteration, creates a new draft with the supplied field updates, and starts it in one call. - `stop-experiment-iteration` — stop the running iteration. You must declare a winner: pass the `winningTreatmentId` (and a `winningReason`). If no variation outperformed, pick the baseline/control as the winner. - `list-metrics`, `create-metric`, `list-metric-events` — manage metrics referenced by the experiment. ## Core Concepts ### What Are Experiments? Experiments in LaunchDarkly measure the impact of feature flag variations on key metrics. An experiment consists of: - **Treatments**: the flag variations being compared (control vs. test). Each treatment has an `allocationPercent`; the values across treatments should sum to 100. - **Metrics**: what you're measuring (conversion rate, latency, revenue, etc.). One must be the primary metric. - **Flag config**: the `flagKey`, `ruleId`, and `flagConfigVersion` of the targeting rule that drives the experiment. - **Iteration**: a single data-collection window. Created in `not_started` status, becomes `running` when started, transitions to `stopped` when ended. - **Holdout** (optional): a project-level group of users excluded from the experiment for baseline measurement (`holdoutId`). ### Experiment Lifecycle 1. **Create** the experiment with its first iteration (`create-experiment`). 2. **Start the iteration** to begin data collection (`start-experiment-iteration`). 3. **Monitor** results as data accumulates (`get-experiment`). 4. **Evolve the design** mid-experiment if needed — change locked fields like `treatments`, `metrics`, or `methodology` by calling `save-and-start-experiment-iteration`, which stops the current iteration, creates a new draft with your changes, and starts it. 5. **Stop the iteration** when you have a winner or a clear call (`stop-experiment-iteration`). 6. **Ship** the winning variation. ## Core Principles 1. **Metrics first**: ensure the metrics you'll reference exist before creating the experiment. 2. **Clear hypothesis**: every iteration requires a `hypothesis` string; state what you expect to improve and by how much. 3. **Proper controls**: exactly one treatment must have `baseline: true`. 4. **Sufficient sample size**: let iterations run long enough for statistical significance. 5. **One change at a time**: test one variable per