
Statistical Analysis
Check statistical assumptions before trusting t-tests, ANOVA, regression, or related analyses so solo-built metrics and experiments are defensible.
Overview
Statistical Analysis is an agent skill most often used in Validate (also Build backend analytics and Grow analytics) that guides assumption checks and remedial steps before interpreting statistical tests.
Install
npx skills add https://github.com/davila7/claude-code-templates --skill statistical-analysisWhat is this skill?
- Five general principles: check before interpret, multiple diagnostics, document violations and remedial actions
- Independence checks with ACF/PACF, Durbin-Watson, ICC guidance and HIGH severity when violated
- Normality guidance for t-tests, ANOVA, and regression residuals with small-n vs robust large-n notes
- Remediation paths: mixed-effects, time series, GEE when independence fails
- Structured assumption sections (what it means, how to check, what to do if violated)
- Documents 5 general principles for assumption validation
- Flags independence violations as HIGH critical severity for Type I error inflation
Adoption & trust: 637 installs on skills.sh; 27.8k GitHub stars; 3/3 security scanners passed (skills.sh audits).
What problem does it solve?
You are ready to call a metric “significant” but have not verified independence, normality, or other test assumptions—and violations can invalidate the conclusion.
Who is it for?
Solo builders running experiments, surveys, or product analytics who need a repeatable assumption checklist embedded in the agent workflow.
Skip if: Pure visualization without inference, or teams that already have a signed-off analysis plan and external biostat review with no agent involvement.
When should I use this skill?
Before interpreting statistical test results or publishing analysis conclusions from agent-assisted data work.
What do I get? / Deliverables
You get documented assumption checks, chosen diagnostics, and explicit remedial actions (e.g., mixed-effects or time-series methods) before reporting test results.
- Documented assumption checks with visual and formal diagnostics
- Recorded violations and remedial analysis approach
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Assumption checking happens before you commit to conclusions from data—Validate is the canonical shelf because it is about proving the idea and analysis plan, not shipping UI. Scope covers defining what evidence counts; diagnostic procedures fit scoping whether a test or model is appropriate for the sample and design.
Where it fits
Decide whether your landing-page A/B design supports an independent-samples test before you freeze the metric definition.
Embed assumption checks in a script that runs ANOVA on cohort tables exported from your app database.
Re-verify regression residual normality after a pricing change before updating the investor-facing metrics memo.
How it compares
Diagnostic methodology for inference—not a charting skill or a one-click auto-ML training pack.
Common Questions / FAQ
Who is statistical-analysis for?
Solo and indie builders doing their own statistical tests or reviewing agent-generated analyses who need structured assumption validation without hiring a statistician for every decision.
When should I use statistical-analysis?
In Validate when scoping an experiment or survey analysis; in Build when implementing analytics pipelines; in Grow when interpreting lifecycle or funnel tests—always before final significance claims.
Is statistical-analysis safe to install?
It is documentation and procedure guidance with no built-in data exfiltration; still review the Security Audits panel on this page and avoid piping raw PII into prompts unnecessarily.
SKILL.md
READMESKILL.md - Statistical Analysis
# Statistical Assumptions and Diagnostic Procedures This document provides comprehensive guidance on checking and validating statistical assumptions for various analyses. ## General Principles 1. **Always check assumptions before interpreting test results** 2. **Use multiple diagnostic methods** (visual + formal tests) 3. **Consider robustness**: Some tests are robust to violations under certain conditions 4. **Document all assumption checks** in analysis reports 5. **Report violations and remedial actions taken** ## Common Assumptions Across Tests ### 1. Independence of Observations **What it means**: Each observation is independent; measurements on one subject do not influence measurements on another. **How to check**: - Review study design and data collection procedures - For time series: Check autocorrelation (ACF/PACF plots, Durbin-Watson test) - For clustered data: Consider intraclass correlation (ICC) **What to do if violated**: - Use mixed-effects models for clustered/hierarchical data - Use time series methods for temporally dependent data - Use generalized estimating equations (GEE) for correlated data **Critical severity**: HIGH - violations can severely inflate Type I error --- ### 2. Normality **What it means**: Data or residuals follow a normal (Gaussian) distribution. **When required**: - t-tests (for small samples; robust for n > 30 per group) - ANOVA (for small samples; robust for n > 30 per group) - Linear regression (for residuals) - Some correlation tests (Pearson) **How to check**: **Visual methods** (primary): - Q-Q (quantile-quantile) plot: Points should fall on diagonal line - Histogram with normal curve overlay - Kernel density plot **Formal tests** (secondary): - Shapiro-Wilk test (recommended for n < 50) - Kolmogorov-Smirnov test - Anderson-Darling test **Python implementation**: ```python from scipy import stats import matplotlib.pyplot as plt # Shapiro-Wilk test statistic, p_value = stats.shapiro(data) # Q-Q plot stats.probplot(data, dist="norm", plot=plt) ``` **Interpretation guidance**: - For n < 30: Both visual and formal tests important - For 30 ≤ n < 100: Visual inspection primary, formal tests secondary - For n ≥ 100: Formal tests overly sensitive; rely on visual inspection - Look for severe skewness, outliers, or bimodality **What to do if violated**: - **Mild violations** (slight skewness): Proceed if n > 30 per group - **Moderate violations**: Use non-parametric alternatives (Mann-Whitney, Kruskal-Wallis, Wilcoxon) - **Severe violations**: - Transform data (log, square root, Box-Cox) - Use non-parametric methods - Use robust regression methods - Consider bootstrapping **Critical severity**: MEDIUM - parametric tests are often robust to mild violations with adequate sample size --- ### 3. Homogeneity of Variance (Homoscedasticity) **What it means**: Variances are equal across groups or across the range of predictors. **When required**: - Independent samples t-test - ANOVA - Linear regression (constant variance of residuals) **How to check**: **Visual methods** (primary): - Box plots by group (for t-test/ANOVA) - Residuals vs. fitted values plot (for regression) - should show random scatter - Scale-location plot (square root of standardized residuals vs. fitted) **Formal tests** (secondary): - Levene's test (robust to non-normality) - Bartlett's test (sensitive to non-normality, not recommended) - Brown-Forsythe test (median-based version of Levene's) - Breusch-Pagan test (for regression) **Python implementation**: ```python from scipy import stats import pingouin as pg # Levene's test statistic, p_value = stats.levene(group1, group2, group3) # For regression # Breusch-Pagan test from statsmodels.stats.diagnostic import het_breuschpagan _, p_value, _, _ = het_breuschpagan(residuals, exog) ``` **Interpretation guidance**: - Variance ratio (max/min) < 2-3: Generally acceptable - For ANOVA: Test is robust if groups have equal sizes - For regression: Look