Launchdarkly Experiment Setup

Name: Launchdarkly Experiment Setup
Author: launchdarkly

launchdarkly/agent-skills

2.4k installs
20 repo stars
Updated July 27, 2026
launchdarkly/agent-skills

launchdarkly-experiment-setup is an agent skill for "Set up and run experiments in LaunchDarkly. Create experiments with metrics, treatments, and flag config, start iterati

About

The launchdarkly-experiment-setup skill "Set up and run experiments in LaunchDarkly. Create experiments with metrics, treatments, and flag config, start iterations to collect data, swap design between iterations, and stop with a winner." It covers create-experiment - create a new experiment with its initial iteration hypothesis, metrics, treatments, flag config .. Key workflows include start-experiment-iteration - begin collecting data for an experiment's current draft iteration.. Step 1: Prepare Metrics 1. Use list-metrics to find existing metrics. 2. If you need a new one, use create-metric and note the key. 3. Decide which is the primary metric a single metric or a funnel group . You'll pass its key as primarySingleMetricKey or primaryFunnelKey on the iteration. | Goal | Metric type | Example key | |------|-------------|-------------| | Conversion | Custom conversion | c Developers invoke launchdarkly-experiment-setup when the task matches the triggers and reference files in SKILL.md for grounded, stepwise execution.

create-experiment - create a new experiment with its initial iteration hypothesis, metrics, treatments, flag config .
start-experiment-iteration - begin collecting data for an experiment's current draft iteration.
get-experiment - check experiment status, treatments, metrics, and current iteration.
list-experiments - browse existing experiments in the project.
list-metrics , create-metric , list-metric-events - manage metrics referenced by the experiment.

Launchdarkly Experiment Setup by the numbers

2,414 all-time installs (skills.sh)
+174 installs in the week ending Jul 28, 2026 (Skillselion tracking)
Ranked #48 of 854 Sales & Marketing skills by installs in the Skillselion catalog
Security screen: LOW risk (skills.sh audit)
Data as of Jul 28, 2026 (Skillselion catalog sync)

At a glance

launchdarkly-experiment-setup capabilities & compatibility

Capabilities: create experiment create a new experiment with · start experiment iteration begin collecting da · get experiment check experiment status, treatm · list experiments browse existing experiments i · list metrics , create metric , list metric event
Use cases: documentation

From the docs

What launchdarkly-experiment-setup says it does

compatibility: Requires the remotely hosted LaunchDarkly MCP server

SKILL.md

This skill requires the remotely hosted LaunchDarkly MCP server to be configured in your environment.

SKILL.md

npx skills add https://github.com/launchdarkly/agent-skills --skill launchdarkly-experiment-setup

Add your badge

Show developers this skill is listed on Skillselion. Paste this into your README.

[![Listed on Skillselion](https://skillselion.com/badge/skills/launchdarkly/agent-skills/launchdarkly-experiment-setup.svg)](https://skillselion.com/skills/launchdarkly/agent-skills/launchdarkly-experiment-setup)

Installs	2.4k
repo stars	★ 20
Security audit	3 / 3 scanners passed
Last updated	July 27, 2026
Repository	launchdarkly/agent-skills ↗

What problem does launchdarkly-experiment-setup solve for developers using the documented workflows?

"Set up and run experiments in LaunchDarkly. Create experiments with metrics, treatments, and flag config, start iterations to collect data, swap design between iterations, and stop with a winner."

Who is it for?

Developers working with launchdarkly-experiment-setup patterns described in the skill documentation.

Skip if: Skip when docs are empty or the task is outside the skill documented scope.

When should I use this skill?

Use when "Set up and run experiments in LaunchDarkly. Create experiments with metrics, treatments, and flag config, start iterations to collect data, swap design between iterations, and sto

What you get

Actionable launchdarkly-experiment-setup guidance grounded in SKILL.md workflows and reference files.

Experiment definitions
Iteration configs
Winner conclusion records

By the numbers

Skill version 0.2.0 under Apache-2.0 license
Requires the remotely hosted LaunchDarkly MCP server

Files

SKILL.mdMarkdownGitHub ↗

LaunchDarkly Experiment Setup

You're using a skill that guides you through setting up and running experiments in LaunchDarkly. Your job is to design the experiment, create it with the right metrics, treatments, and flag config, start data collection, evolve the design between iterations when needed, and stop with a winner.

Prerequisites

This skill requires the remotely hosted LaunchDarkly MCP server to be configured in your environment.

Required MCP tools:

create-experiment — create a new experiment with its initial iteration (hypothesis, metrics, treatments, flag config).
start-experiment-iteration — begin collecting data for an experiment's current draft iteration.
get-experiment — check experiment status, treatments, metrics, and current iteration.

Optional MCP tools:

list-experiments — browse existing experiments in the project.
update-experiment — update fields on the experiment or its current iteration. Honours mutableFieldsByStatus, so what's editable depends on whether the iteration is not_started, running, or stopped. Returns rejected inputs under skipped.
save-and-start-experiment-iteration — the API-recommended way to change locked fields on a running experiment. Stops the current iteration, creates a new draft with the supplied field updates, and starts it in one call.
stop-experiment-iteration — stop the running iteration. You must declare a winner: pass the winningTreatmentId (and a winningReason). If no variation outperformed, pick the baseline/control as the winner.
list-metrics, create-metric, list-metric-events — manage metrics referenced by the experiment.

Core Concepts

What Are Experiments?

Experiments in LaunchDarkly measure the impact of feature flag variations on key metrics. An experiment consists of:

Treatments: the flag variations being compared (control vs. test). Each treatment has an allocationPercent; the values across treatments should sum to 100.
Metrics: what you're measuring (conversion rate, latency, revenue, etc.). One must be the primary metric.
Flag config: the flagKey, ruleId, and flagConfigVersion of the targeting rule that drives the experiment.
Iteration: a single data-collection window. Created in not_started status, becomes running when started, transitions to stopped when ended.
Holdout (optional): a project-level group of users excluded from the experiment for baseline measurement (holdoutId).

Experiment Lifecycle

1. Create the experiment with its first iteration (create-experiment). 2. Start the iteration to begin data collection (start-experiment-iteration). 3. Monitor results as data accumulates (get-experiment). 4. Evolve the design mid-experiment if needed — change locked fields like treatments, metrics, or methodology by calling save-and-start-experiment-iteration, which stops the current iteration, creates a new draft with your changes, and starts it. 5. Stop the iteration when you have a winner or a clear call (stop-experiment-iteration). 6. Ship the winning variation.

Core Principles

1. Metrics first: ensure the metrics you'll reference exist before creating the experiment. 2. Clear hypothesis: every iteration requires a hypothesis string; state what you expect to improve and by how much. 3. Proper controls: exactly one treatment must have baseline: true. 4. Sufficient sample size: let iterations run long enough for statistical significance. 5. One change at a time: test one variable per experiment for clear attribution.

Workflow

Step 1: Prepare Metrics

1. Use list-metrics to find existing metrics. 2. If you need a new one, use create-metric and note the key. 3. Decide which is the primary metric (a single metric or a funnel group). You'll pass its key as primarySingleMetricKey or primaryFunnelKey on the iteration.

Goal	Metric type	Example key
Conversion	Custom conversion	`checkout-completed`
Performance	Custom numeric	`page-load-time-ms`
Engagement	Custom conversion	`feature-clicked`
Revenue	Custom numeric	`order-value`

Step 2: Identify the Targeting Rule

You need the ruleId and current flagConfigVersion of the flag rule that will drive the experiment. Use get-flag on the flag (or its environment-scoped status) to find them. The fallthrough rule's id is the string "fallthrough".

Step 3: Create the Experiment

Call create-experiment. The top-level fields describe the experiment; the nested iteration object describes the first data-collection window.

{
  "projectKey": "my-project",
  "environmentKey": "production",
  "key": "checkout-flow-v2-experiment",
  "name": "Checkout Flow v2 Experiment",
  "description": "Compare the redesigned checkout against the current flow.",
  "tags": ["growth", "checkout"],
  "methodology": "bayesian",
  "iteration": {
    "hypothesis": "The redesigned checkout will lift completion rate by 3%.",
    "primarySingleMetricKey": "checkout-completed",
    "metrics": [
      { "key": "checkout-completed" },
      { "key": "checkout-time-seconds" }
    ],
    "treatments": [
      {
        "name": "Control",
        "baseline": true,
        "allocationPercent": 50,
        "parameters": [
          { "flagKey": "checkout-flow-v2", "variationId": "variation-a-id" }
        ]
      },
      {
        "name": "New Checkout",
        "baseline": false,
        "allocationPercent": 50,
        "parameters": [
          { "flagKey": "checkout-flow-v2", "variationId": "variation-b-id" }
        ]
      }
    ],
    "flags": {
      "checkout-flow-v2": {
        "ruleId": "fallthrough",
        "flagConfigVersion": 7
      }
    },
    "randomizationUnit": "user"
  }
}

Useful optional top-level fields:

holdoutId — attach an existing holdout.
dataSource — "launchdarkly" (default), "snowflake", or "databricks".
methodology — "bayesian" (default), "frequentist", or "export_only".
analysisConfig — set thresholds, multiple-comparison correction, or sequential testing.

Useful optional iteration fields:

attributes — array of context attribute keys to slice results by (e.g. ["country", "device"]).
covariateId — covariate CSV id for stratified sampling.
canReshuffleTraffic — defaults to true; set false to lock users to their initial variation when allocations change.

Step 4: Start Data Collection

{
  "projectKey": "my-project",
  "environmentKey": "production",
  "experimentKey": "checkout-flow-v2-experiment"
}

Before starting, the API requires that:

the flag is toggled on,
the iteration has a randomizationUnit, and
at least one treatment has a non-zero allocationPercent.

Pass changeJustification if you're restarting after a prior iteration was stopped.

Step 5: Verify

1. Call get-experiment and confirm currentIteration.status === "running". 2. Check that treatments are present with the expected allocations. 3. Check the metric list and the primary metric.

Step 6: Evolve the Design Mid-Experiment (when needed)

Most structural fields (treatments, metrics, methodology, hypothesis, …) are locked while an iteration is running. Two ways to change them:

Light edits while running — update-experiment will let through anything mutableFieldsByStatus permits in the running state (typically just metadata like name, description, maintainerId, tags, plus appending metrics/attributes). It surfaces rejected fields under skipped with a reason.
Real design changes — call save-and-start-experiment-iteration. It stops the current iteration, creates a new draft with the supplied field updates applied, and starts it in one call. Inputs match update-experiment, plus changeJustification. Mutability is checked against not_started since updates land on the new draft.

Example: swap the treatment allocation and add a metric in a single call.

{
  "projectKey": "my-project",
  "environmentKey": "production",
  "experimentKey": "checkout-flow-v2-experiment",
  "changeJustification": "Lowering control allocation now that variant looks safe.",
  "treatments": [
    {
      "name": "Control",
      "baseline": true,
      "allocationPercent": 30,
      "parameters": [{ "flagKey": "checkout-flow-v2", "variationId": "variation-a-id" }]
    },
    {
      "name": "New Checkout",
      "baseline": false,
      "allocationPercent": 70,
      "parameters": [{ "flagKey": "checkout-flow-v2", "variationId": "variation-b-id" }]
    }
  ],
  "metrics": [
    { "key": "checkout-completed" },
    { "key": "checkout-time-seconds" },
    { "key": "checkout-error-rate" }
  ]
}

Step 7: Stop the Iteration

When you've reached significance or made a call, stop the iteration. A winning treatment is required to stop — LaunchDarkly does not let you end an iteration without declaring a winner. Pass the winning treatment's id (returned in get-experiment as _id on each treatment) plus a winningReason.

If the experiment was inconclusive or no variation beat the control, declare the baseline/control treatment as the winner and say so in winningReason (e.g. "Inconclusive — no significant lift, keeping control"). There is no "stop without a winner" path.

{
  "projectKey": "my-project",
  "environmentKey": "production",
  "experimentKey": "checkout-flow-v2-experiment",
  "winningTreatmentId": "treat-002",
  "winningReason": "Two weeks of data, +4.1% lift on the primary metric with PBBL > 95%."
}

Report results:

Iteration stopped with the declared winningTreatmentId (the control/baseline if inconclusive).
Lift / significance summary on the primary metric.
Next steps (ship the winner, roll back, or start a follow-up iteration).

Edge Cases

Situation	Action
Metric doesn't exist	Create it first with `create-metric`.
Flag has no variations to compare	Create flag variations before designing treatments.
You don't know the flag's `ruleId` / `flagConfigVersion`	Use `get-flag` or `get-flag-status-across-envs`. The fallthrough rule's id is the string `"fallthrough"`.
Experiment already exists	Use `list-experiments` to find it; `get-experiment` for details.
Need to change locked fields mid-experiment	Use `save-and-start-experiment-iteration` (single call) rather than stopping and recreating by hand.
`update-experiment` returns `skipped` for a field	Inspect the `currentStatus` and `allowedFields` in the response — that field isn't mutable in the current iteration status. Either stop the iteration first or use `save-and-start-experiment-iteration`.

What NOT to Do

Don't omit iteration on create-experiment — it's required.
Don't set baseline: true on more than one treatment.
Don't let allocationPercent values fail to sum to 100 across treatments.
Don't try to change locked iteration fields with update-experiment while the iteration is running — reach for save-and-start-experiment-iteration instead.
Don't stop iterations early — wait for statistical significance.
Don't run multiple experiments on the same flag at the same time without a careful holdout design.

Related skills

Sales EnablementGenerate high-quality sales collateral, pitch decks, one-pagers, objection handlers, demo scripts and playbooks that help close deals.82.2k42k

Churn PreventionDesign cancel flows, dunning, and health-score tactics to cut voluntary and failed-payment churn on a subscription product.81.9k42k

RevopsDesign and optimize revenue operations, lead lifecycle management, and marketing-to-sales handoff processes.78.9k42k

Cross Border EcommerceEvaluate and prioritize international markets before committing engineering and marketing resources to global expansion.61.8k498

Community MarketingGet structured, battle-tested guidance when choosing community platforms, diagnosing engagement problems, or scaling an early-stage user community.50.5k42k

Co MarketingSystematically identify and brainstorm campaigns with complementary products that share the same audience but solve different problems.41.9k42k

Forks & variants (1)

Launchdarkly Experiment Setup has 1 known copy in the catalog totaling 37 installs. They canonicalize to this original listing.

launchdarkly - 37 installs

How it compares

Use this when experimentation must live inside LaunchDarkly flag infrastructure rather than a standalone analytics or split-testing SaaS.

FAQ

Who is launchdarkly-experiment-setup for?

Developers and software engineers working with launchdarkly-experiment-setup patterns described in the skill documentation.

When should I use launchdarkly-experiment-setup?

When "Set up and run experiments in LaunchDarkly. Create experiments with metrics, treatments, and flag config, start iterations to collect data, swap design between iterations, and sto.

Is launchdarkly-experiment-setup safe to install?

Review the Security Audits panel on this page before installing in production.

Sales & Marketingpricing