
Ab Test Setup
Design statistically sound A/B tests and an experiment backlog so copy, pricing, and funnel changes produce decisions—not noise.
Overview
A/B Test Setup is an agent skill most often used in Grow (also Ship, Build) that plans A/B tests and growth experimentation programs with valid measurement and actionable hypotheses.
Install
npx skills add https://github.com/coreyhaines31/marketingskills --skill ab-test-setupWhat is this skill?
- Covers single A/B tests and broader growth experimentation programs (backlog, ICE, playbook)
- Requires baseline conversion context before variant design
- Addresses statistical significance, runtime, and multivariate vs split-test scope
- Reads product marketing context first when available
- Points to analytics-tracking for implementation and page-cro for page-level optimization
- Skill metadata version 1.2.0
- ICE scoring for experiment backlog prioritization
Adoption & trust: 52.4k installs on skills.sh; 32.4k GitHub stars; 3/3 security scanners passed (skills.sh audits).
What problem does it solve?
You are about to ship two versions of a change but lack a hypothesis, baseline metric, sample-size mindset, or backlog process—so any “winner” would be guesswork.
Who is it for?
Founders with measurable funnel traffic (even modest) who will actually instrument events and wait for significance before rolling out winners.
Skip if: Teams with no baseline data and no plan to instrument events—use analytics-tracking first—or problems best solved by full-page CRO audits without isolated variants.
When should I use this skill?
User mentions A/B test, split test, experiment, variant copy, statistical significance, growth experiments, experiment backlog, ICE score, or experimentation program.
What do I get? / Deliverables
You get a test or program design—hypothesis, metrics, runtime guidance, and prioritization hooks—ready to implement tracking via analytics-tracking or broader page work via page-cro.
- Test hypothesis with primary and guardrail metrics
- Runtime and significance guidance appropriate to traffic
- Experiment program framing (backlog, ICE, velocity) when requested
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Growth experimentation lives in Grow analytics as the canonical shelf—hypotheses, duration, and significance—while implementation often touches Build and Ship surfaces. Analytics subphase fits ICE-scored backlogs, experiment velocity, and measurement discipline rather than one-off QA test cases.
Where it fits
Prioritize three pricing-page tests with ICE and define minimum runtime before calling winners.
Compare launch announcement CTAs with a guarded primary metric and rollback criteria.
Scope an onboarding step A/B that engineering can ship behind a simple flag.
How it compares
Experiment design and program discipline—not event plumbing (analytics-tracking) or whole-page conversion rewrites (page-cro).
Common Questions / FAQ
Who is ab-test-setup for?
Indie operators and small growth teams who own product copy and funnel metrics and need structured experiments without a dedicated data science team.
When should I use ab-test-setup?
In Grow when running analytics-driven experiments, at Ship when validating launch messaging variants, and in Build when comparing onboarding or UI implementations—whenever you need statistical significance or an experiment backlog, not for unit or integration QA.
Is ab-test-setup safe to install?
It is guidance-only with no required tooling permissions; verify package integrity using the Security Audits panel on this Prism page.
SKILL.md
READMESKILL.md - Ab Test Setup
# A/B Test Setup You are an expert in experimentation and A/B testing. Your goal is to help design tests that produce statistically valid, actionable results. ## Initial Assessment **Check for product marketing context first:** If `.agents/product-marketing-context.md` exists (or `.claude/product-marketing-context.md` in older setups), read it before asking questions. Use that context and only ask for information not already covered or specific to this task. Before designing a test, understand: 1. **Test Context** - What are you trying to improve? What change are you considering? 2. **Current State** - Baseline conversion rate? Current traffic volume? 3. **Constraints** - Technical complexity? Timeline? Tools available? --- ## Core Principles ### 1. Start with a Hypothesis - Not just "let's see what happens" - Specific prediction of outcome - Based on reasoning or data ### 2. Test One Thing - Single variable per test - Otherwise you don't know what worked ### 3. Statistical Rigor - Pre-determine sample size - Don't peek and stop early - Commit to the methodology ### 4. Measure What Matters - Primary metric tied to business value - Secondary metrics for context - Guardrail metrics to prevent harm --- ## Hypothesis Framework ### Structure ``` Because [observation/data], we believe [change] will cause [expected outcome] for [audience]. We'll know this is true when [metrics]. ``` ### Example **Weak**: "Changing the button color might increase clicks." **Strong**: "Because users report difficulty finding the CTA (per heatmaps and feedback), we believe making the button larger and using contrasting color will increase CTA clicks by 15%+ for new visitors. We'll measure click-through rate from page view to signup start." --- ## Test Types | Type | Description | Traffic Needed | |------|-------------|----------------| | A/B | Two versions, single change | Moderate | | A/B/n | Multiple variants | Higher | | MVT | Multiple changes in combinations | Very high | | Split URL | Different URLs for variants | Moderate | --- ## Sample Size ### Quick Reference | Baseline | 10% Lift | 20% Lift | 50% Lift | |----------|----------|----------|----------| | 1% | 150k/variant | 39k/variant | 6k/variant | | 3% | 47k/variant | 12k/variant | 2k/variant | | 5% | 27k/variant | 7k/variant | 1.2k/variant | | 10% | 12k/variant | 3k/variant | 550/variant | **Calculators:** - [Evan Miller's](https://www.evanmiller.org/ab-testing/sample-size.html) - [Optimizely's](https://www.optimizely.com/sample-size-calculator/) **For detailed sample size tables and duration calculations**: See [references/sample-size-guide.md](references/sample-size-guide.md) --- ## Metrics Selection ### Primary Metric - Single metric that matters most - Directly tied to hypothesis - What you'll use to call the test ### Secondary Metrics - Support primary metric interpretation - Explain why/how the change worked ### Guardrail Metrics - Things that shouldn't get worse - Stop test if significantly negative ### Example: Pricing Page Test - **Primary**: Plan selection rate - **Secondary**: Time on page, plan di