A B Test Design

Name: A B Test Design
Author: owl-listener

owl-listener/designer-skills

954 installs
1.9k repo stars
Updated June 14, 2026
owl-listener/designer-skills

a-b-test-design is an agent skill that designs rigorous A/B experiments with hypotheses, metrics, and sample sizes for developers who need statistically valid UI tests before launch.

About

a-b-test-design is a designer-skills agent skill for rigorous A/B experiment design. It structures hypotheses as If we [change], then [outcome] will [improve/decrease] because [rationale], defines control and treatment variants with one isolated variable, selects a primary success metric, and calculates required sample sizes for statistical rigor. Secondary metrics and guardrails can be included. Use a-b-test-design before implementing split tests when results must be actionable and defensible. The skill prevents underpowered experiments and multi-variable confounding common in ad-hoc UI changes.

Designs experiments using the exact 6-part structure: Hypothesis, Variants, Primary Metric, Secondary Metrics, Sample Si
Enforces one-variable-at-a-time rule and isolates control vs treatment
Calculates required sample size using minimum detectable effect, baseline rate, 95% significance and 80% power
Lists common pitfalls including peeking, insufficient sample size, novelty effects and segmentation bias
Explicitly flags when not to run an A/B test (low traffic, ethical concerns, foundational changes)

A B Test Design by the numbers

954 all-time installs (skills.sh)
+43 installs in the week ending Jul 29, 2026 (Skillselion tracking)
Ranked #491 of 3,282 Productivity & Planning skills by installs in the Skillselion catalog
Security screen: LOW risk (skills.sh audit)
Data as of Jul 31, 2026 (Skillselion catalog sync)

npx skills add https://github.com/owl-listener/designer-skills --skill a-b-test-design

Add your badge

Show developers this skill is listed on Skillselion. Paste this into your README.

[![Listed on Skillselion](https://skillselion.com/badge/skills/owl-listener/designer-skills/a-b-test-design.svg)](https://skillselion.com/skills/owl-listener/designer-skills/a-b-test-design)

Installs	954
repo stars	★ 1.9k
Security audit	3 / 3 scanners passed
Last updated	June 14, 2026
Repository	owl-listener/designer-skills ↗

How do you design a statistically valid A/B test?

Design statistically valid A/B tests with clear hypotheses, isolated variants, primary and secondary metrics, and proper sample size calculations.

Who is it for?

Product and frontend teams planning UI experiments who need sample size rigor and single-variable isolation.

Skip if: Teams already running tests who only need results analysis or developers with no hypothesis to validate.

When should I use this skill?

A UI or product change needs a structured A/B test plan with hypothesis, metrics, and sample size before implementation.

What you get

A/B test plan with hypothesis, control and treatment variants, primary metric, sample size, and secondary metrics.

A/B test design document
Sample size calculation

Files

SKILL.mdMarkdownGitHub ↗

A/B Test Design

You are an expert in designing rigorous A/B experiments that produce actionable results.

What You Do

You design A/B tests with clear hypotheses, controlled variants, appropriate metrics, and statistical rigor.

Test Structure

1. Hypothesis

Structured as: 'If we [change], then [outcome] will [improve/decrease] because [rationale].'

2. Variants

Control (A): current design
Treatment (B): proposed change
Keep changes isolated — test one variable at a time

3. Primary Metric

The single most important measure of success. Must be measurable, relevant, and sensitive to the change.

4. Secondary Metrics

Supporting measures and guardrail metrics to detect unintended consequences.

5. Sample Size

Based on: minimum detectable effect, baseline conversion rate, statistical significance level (typically 95%), and power (typically 80%).

6. Duration

Run until sample size is reached. Account for weekly cycles (run in full weeks). Minimum 1-2 weeks typically.

Common Pitfalls

Peeking at results before completion
Too many variants at once
Metric not sensitive enough to detect change
Sample size too small
Not accounting for novelty effects
Ignoring segmentation effects

When Not to A/B Test

Very low traffic (insufficient sample)
Ethical concerns with withholding improvement
Foundational changes that affect everything
When qualitative insight is more valuable

Best Practices

One hypothesis per test
Document everything before starting
Don't stop early on positive results
Analyze segments after overall results
Share learnings broadly regardless of outcome

Related skills

Grill MeHave an agent relentlessly pressure-test their plan or design until every decision branch is resolved and shared understanding is reached.645k185k

Grill With DocsPressure-test architecture decisions against their live codebase and living documentation before writing production code.547k185k

Lark CalendarLet their coding agents read, create, and manage events inside Lark (Feishu) Calendar directly from natural language instructions.466k

Lark ContactLet their AI coding agents read and write contacts, send messages, and manage user data inside Lark (Feishu).466k

Lark ImLet their coding agent send instant notifications, post updates, and receive commands directly inside Lark (Feishu) workspaces.466k

Lark AttendanceAutomatically fetch and report daily attendance data from Lark (Feishu) into their own tools or agents.465k

FAQ

What hypothesis format does a-b-test-design use?

a-b-test-design structures hypotheses as: If we [change], then [outcome] will [improve/decrease] because [rationale]. This keeps variants tied to a measurable primary metric and sample size plan.

Does a-b-test-design calculate sample size?

Yes—a-b-test-design includes sample size calculations for statistical rigor. It also enforces one isolated variable between control and treatment variants with a defined primary success metric.

Is A B Test Design safe to install?

skills.sh reports 3 of 3 security scanners passed. Review the Security Audits panel on this page before installing in production.

Productivity & Planninglifecycle