Data Analysis

Solo builders most often reach for rigorous analysis when interpreting product, funnel, or experiment data in the Grow phase, even though the same pipeline applies earlier for validation datasets. Analytics is the canonical shelf for turning raw tables into tested insights and figures that inform retention, pricing, and feature bets.

Also useful

Also useful

Where it fits

Example use

Clean survey or competitor export tables and test differences before you freeze MVP scope or pricing tiers.

Example use

Profile event or warehouse extracts while integrating analytics pipelines so schema issues surface before ship.

Example use

OperateIteration & experiments

Summarize cohort retention or experiment arms with effect sizes for weekly growth reviews.

Example use

Re-run the same pipeline on production metric dumps after an incident to confirm whether a fix moved the needle.

How it compares

Use instead of asking the agent to "just plot this CSV" when you need assumption checks and citable methodology, not a one-line matplotlib snippet.

Common Questions / FAQ

Who is data-analysis for?

Solo builders and small teams shipping with Claude Code, Cursor, or Codex who analyze product, marketing, or research datasets and want scipy-grade discipline inside the agent.

When should I use data-analysis?

Use it during Validate when sizing markets from public datasets, during Build when profiling backend or integration metrics, and during Grow when interpreting retention, activation, or experiment results before changing the roadmap.

Is data-analysis safe to install?

Review the Security Audits panel on this Prism page and your org policy before running agent skills that read local files; the skill focuses on analysis libraries rather than calling external APIs by default.

SKILL.md

READMESKILL.md - Data Analysis

# Data Analysis

Part of [Agent Skills™](https://github.com/itallstartedwithaidea/agent-skills) by [googleadsagent.ai™](https://googleadsagent.ai)

## Description

Data Analysis provides a structured framework for statistical analysis using pandas, numpy, and scipy, with visualization through matplotlib, seaborn, and plotly. The agent follows a rigorous pipeline from data ingestion and cleaning through exploratory analysis, statistical testing, and publication-quality visualization, ensuring reproducibility at every step.

Scientific data analysis is not exploratory coding—it is a disciplined process where every transformation is justified, every statistical test has verified assumptions, and every visualization accurately represents the underlying data. This skill enforces that discipline by requiring the agent to document data provenance, validate distributions before applying parametric tests, and report effect sizes alongside p-values.

The visualization layer produces figures suitable for journal submission: proper axis labels with units, colorblind-safe palettes, appropriate figure sizes for single or double-column layouts, and vector output formats (SVG, PDF). Interactive plotly visualizations are generated for exploratory work; static matplotlib/seaborn figures for publication.

## Use When

- Performing statistical analysis on experimental or observational data
- Cleaning and transforming datasets for downstream analysis
- Creating publication-quality figures and plots
- Running hypothesis tests with proper assumption checking
- Exploratory data analysis on new datasets
- Building reproducible analysis pipelines

## How It Works

```mermaid
graph TD
    A[Raw Data] --> B[Ingest + Validate Schema]
    B --> C[Clean: Missing Values, Outliers, Types]
    C --> D[Exploratory Data Analysis]
    D --> E[Distribution Assessment]
    E --> F{Parametric Assumptions Met?}
    F -->|Yes| G[Parametric Tests]
    F -->|No| H[Non-Parametric Tests]
    G --> I[Effect Size + Confidence Intervals]
    H --> I
    I --> J[Publication Visualization]
    J --> K[Reproducible Report]
```

The pipeline enforces assumption checking before test selection. Parametric tests (t-test, ANOVA) require normality and homoscedasticity; when assumptions fail, the agent automatically selects non-parametric alternatives (Mann-Whitney, Kruskal-Wallis).

## Implementation

```python
import pandas as pd
import numpy as np
from scipy import stats
import seaborn as sns
import matplotlib.pyplot as plt

def analysis_pipeline(filepath: str) -> dict:
    df = pd.read_csv(filepath)

    report = {
        "shape": df.shape,
        "missing": df.isnull().sum().to_dict(),
        "dtypes": df.dtypes.astype(str).to_dict(),
    }

    numeric_cols = df.select_dtypes(include=[np.number]).columns
    for col in numeric_cols:
        stat, p = stats.shapiro(df[col].dropna()[:5000])
        report[f"{col}_normality"] = {"statistic": stat, "p_value": p, "normal": p > 0.05}

    return report

def compare_groups(df: pd.DataFrame, value_col: str, group_col: str) -> dict:
    groups = [g[value_col].dropna() for _, g in df.groupby(group_col)]

    normality_ok = all(stats.shapiro(g[:5000]).pvalue > 0.05 for g in groups)
    _, levene_p = stats.levene(*groups)
    homoscedastic = levene_p > 0.05

    if normality_ok and homoscedastic:
        stat, p = stats.f_oneway(*groups) if len(groups) > 2 else stats.ttest_ind(*groups)
        test_name = "ANOVA" if len(groups) > 2 else "t-test"
    else:
        stat, p = stats.kruskal(*groups)
        test_name = "Kruskal-Wallis"

    effect = compute_cohens_d(groups[0], groups[1]) if len(groups) == 2 else compute_eta_squared(groups)

    return {"test": test_name, "statistic": stat, "p_value": p, "effect_size": effect}

def publ

What is this skill?

End-to-end pipeline: ingest, clean, explore, test, and visualize with pandas, numpy, and scipy

Requires documented provenance and distribution checks before parametric tests

Reports effect sizes alongside p-values for decision-ready conclusions

Static matplotlib and seaborn plus plotly for exploration, with colorblind-safe palettes

Vector SVG and PDF outputs sized for single- or double-column publication layouts

Visualization stack spans matplotlib, seaborn, and plotly with vector SVG and PDF export

Pipeline covers ingestion, cleaning, exploratory analysis, statistical testing, and visualization

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

Adoption & trust: 1 installs on skills.sh; 18 GitHub stars; 3/3 security scanners passed (skills.sh audits); trending (+100% hot-view momentum).

Who is it for?

Indie SaaS founders analyzing funnel exports, survey results, or experiment logs who want one agent-guided ritual instead of reinventing scipy checks each time.

Skip if: Builders who only need a single quick bar chart with no statistical claims, or teams that already enforce a locked Jupyter template and external biostat review.

What do I get? / Deliverables

You get a documented analysis path, validated tests with effect sizes, and publication- or deck-ready static and interactive charts you can cite in decisions.

Cleaned dataset with documented transformation log

Statistical test results including effect sizes and assumption notes

Static and/or interactive figures with labeled axes and accessible styling

Journey fit

Spans multiple journey phases - primary shelf plus alternate fits below.

Primary fit

Also useful

Also useful

Where it fits

Example use

Clean survey or competitor export tables and test differences before you freeze MVP scope or pricing tiers.

Example use

Profile event or warehouse extracts while integrating analytics pipelines so schema issues surface before ship.

Example use