
Ab Test Setup
Install this when you need a statistically defensible A/B or split-test plan—hypothesis, one variable, sample size—before changing copy, pricing UI, or onboarding flows.
Overview
A/B Test Setup is an agent skill most often used in Grow (also Validate, Launch) that designs statistically valid experiments with clear hypotheses and single-variable variants.
Install
npx skills add https://github.com/alirezarezvani/claude-skills --skill ab-test-setupWhat is this skill?
- Forces a specific hypothesis and single-variable tests so results are interpretable
- Runs an initial assessment for context, baseline conversion, traffic, and constraints before design
- Reads `.claude/product-marketing-context.md` when present to avoid redundant questions
- Emphasizes pre-set sample size and not peeking early for statistical rigor
- Hands off implementation tracking to analytics-tracking when instrumentation is needed
- Core principles: 3 numbered pillars (hypothesis, one thing, statistical rigor)
- Initial assessment: 3 constraint areas (context, current state, constraints)
Adoption & trust: 525 installs on skills.sh; 17.5k GitHub stars; 2/3 security scanners passed (skills.sh audits).
What problem does it solve?
You want to test a copy or UX change but do not have a hypothesis, sample size plan, or guardrails—and risk calling noise a win.
Who is it for?
Solo founders with meaningful traffic who are about to change onboarding, pricing presentation, or landing CTAs and need rigor without hiring a growth team.
Skip if: Brand-new ideas with near-zero traffic where formal A/B power is unrealistic, or pure instrumentation work with no experiment design—use analytics-tracking instead.
When should I use this skill?
User wants to plan, design, or implement an A/B test or mentions A/B test, split test, experiment, variant copy, multivariate test, hypothesis, conversion experiment, statistical significance, or test this change.
What do I get? / Deliverables
You leave with a test design you can implement and analyze confidently, with tracking deferred to analytics-tracking when events are not wired yet.
- Test hypothesis and single-variable plan
- Sample-size and stopping discipline outline
- Constraints-aware experiment design ready for tooling
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Experiment design and significance discipline live where you measure conversion and iterate on what shipped—Grow analytics is the primary shelf even when the test targets a landing page. The skill centers baseline rates, traffic volume, sample size, and actionable metrics—not one-off creative—so analytics is the canonical subphase.
Where it fits
Compare two hero headlines on a waitlist page before committing to full product build.
Size a signup-flow experiment using baseline conversion and weekly traffic.
Test positioning snippets in an email or ads landing path with a single CTA change.
Run one pricing presentation variant with a pre-registered success metric.
How it compares
Experiment design and stats guardrails—not the same skill as wiring pixels and event schemas.
Common Questions / FAQ
Who is ab-test-setup for?
Solo and indie builders running SaaS, content, or ecommerce surfaces who mention split tests, variants, hypotheses, or conversion experiments and need a structured plan first.
When should I use ab-test-setup?
In Grow → analytics when measuring funnel changes; in Validate → landing when testing page variants before full build; in Launch → distribution when comparing channel or offer messaging—with one variable per test.
Is ab-test-setup safe to install?
It is guidance-heavy and may read local marketing context files if you have them. Review the Security Audits panel on this Prism page before installing any third-party skill.
Workflow Chain
Then invoke: analytics tracking
SKILL.md
READMESKILL.md - Ab Test Setup
# A/B Test Setup You are an expert in experimentation and A/B testing. Your goal is to help design tests that produce statistically valid, actionable results. ## Initial Assessment **Check for product marketing context first:** If `.claude/product-marketing-context.md` exists, read it before asking questions. Use that context and only ask for information not already covered or specific to this task. Before designing a test, understand: 1. **Test Context** - What are you trying to improve? What change are you considering? 2. **Current State** - Baseline conversion rate? Current traffic volume? 3. **Constraints** - Technical complexity? Timeline? Tools available? --- ## Core Principles ### 1. Start with a Hypothesis - Not just "let's see what happens" - Specific prediction of outcome - Based on reasoning or data ### 2. Test One Thing - Single variable per test - Otherwise you don't know what worked ### 3. Statistical Rigor - Pre-determine sample size - Don't peek and stop early - Commit to the methodology ### 4. Measure What Matters - Primary metric tied to business value - Secondary metrics for context - Guardrail metrics to prevent harm --- ## Hypothesis Framework ### Structure ``` Because [observation/data], we believe [change] will cause [expected outcome] for [audience]. We'll know this is true when [metrics]. ``` ### Example **Weak**: "Changing the button color might increase clicks." **Strong**: "Because users report difficulty finding the CTA (per heatmaps and feedback), we believe making the button larger and using contrasting color will increase CTA clicks by 15%+ for new visitors. We'll measure click-through rate from page view to signup start." --- ## Test Types | Type | Description | Traffic Needed | |------|-------------|----------------| | A/B | Two versions, single change | Moderate | | A/B/n | Multiple variants | Higher | | MVT | Multiple changes in combinations | Very high | | Split URL | Different URLs for variants | Moderate | --- ## Sample Size ### Quick Reference | Baseline | 10% Lift | 20% Lift | 50% Lift | |----------|----------|----------|----------| | 1% | 150k/variant | 39k/variant | 6k/variant | | 3% | 47k/variant | 12k/variant | 2k/variant | | 5% | 27k/variant | 7k/variant | 1.2k/variant | | 10% | 12k/variant | 3k/variant | 550/variant | **Calculators:** - [Evan Miller's](https://www.evanmiller.org/ab-testing/sample-size.html) - [Optimizely's](https://www.optimizely.com/sample-size-calculator/) **For detailed sample size tables and duration calculations**: See [references/sample-size-guide.md](references/sample-size-guide.md) --- ## Metrics Selection ### Primary Metric - Single metric that matters most - Directly tied to hypothesis - What you'll use to call the test ### Secondary Metrics - Support primary metric interpretation - Explain why/how the change worked ### Guardrail Metrics - Things that shouldn't get worse - Stop test if significantly negative ### Example: Pricing Page Test - **Primary**: Plan selection rate - **Secondary**: Time on page, plan distribution - **Guardrail**: Support tickets, refund rate --- ## Designing Variants ### What to Vary | Category | Examples | |----------|----------| | Headlines/Copy | Message angle, value prop, specificity, tone | | Visual Design | Layout, color, images, hierarchy | | CTA | Button copy, size, placement, number | | Content | Information included, order, amount, social proof | ### Best Practices - Single, meaningful change - B