
Data Analysis
Install this when you need reproducible EDA, statistical tests, and journal-grade charts on CSV or research data without ad-hoc notebook spaghetti.
Overview
Data Analysis is an agent skill most often used in Grow (also Validate, Build) that runs a reproducible pandas–numpy–scipy analysis pipeline with matplotlib, seaborn, and plotly visualizations.
Install
npx skills add https://github.com/itallstartedwithaidea/agent-skills --skill data-analysisWhat is this skill?
- End-to-end pipeline: ingest, clean, explore, test, and visualize with pandas, numpy, and scipy
- Requires documented provenance and distribution checks before parametric tests
- Reports effect sizes alongside p-values for decision-ready conclusions
- Static matplotlib and seaborn plus plotly for exploration, with colorblind-safe palettes
- Vector SVG and PDF outputs sized for single- or double-column publication layouts
- Visualization stack spans matplotlib, seaborn, and plotly with vector SVG and PDF export
- Pipeline covers ingestion, cleaning, exploratory analysis, statistical testing, and visualization
Adoption & trust: 1 installs on skills.sh; 18 GitHub stars; 3/3 security scanners passed (skills.sh audits); trending (+100% hot-view momentum).
What problem does it solve?
You have messy tabular data and need trustworthy stats and figures, but exploratory scripts skip assumptions, provenance, and presentation standards.
Who is it for?
Indie SaaS founders analyzing funnel exports, survey results, or experiment logs who want one agent-guided ritual instead of reinventing scipy checks each time.
Skip if: Builders who only need a single quick bar chart with no statistical claims, or teams that already enforce a locked Jupyter template and external biostat review.
When should I use this skill?
You need structured statistical analysis and visualization on tabular data with documented transformations and test assumptions verified.
What do I get? / Deliverables
You get a documented analysis path, validated tests with effect sizes, and publication- or deck-ready static and interactive charts you can cite in decisions.
- Cleaned dataset with documented transformation log
- Statistical test results including effect sizes and assumption notes
- Static and/or interactive figures with labeled axes and accessible styling
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Solo builders most often reach for rigorous analysis when interpreting product, funnel, or experiment data in the Grow phase, even though the same pipeline applies earlier for validation datasets. Analytics is the canonical shelf for turning raw tables into tested insights and figures that inform retention, pricing, and feature bets.
Where it fits
Clean survey or competitor export tables and test differences before you freeze MVP scope or pricing tiers.
Profile event or warehouse extracts while integrating analytics pipelines so schema issues surface before ship.
Summarize cohort retention or experiment arms with effect sizes for weekly growth reviews.
Re-run the same pipeline on production metric dumps after an incident to confirm whether a fix moved the needle.
How it compares
Use instead of asking the agent to "just plot this CSV" when you need assumption checks and citable methodology, not a one-line matplotlib snippet.
Common Questions / FAQ
Who is data-analysis for?
Solo builders and small teams shipping with Claude Code, Cursor, or Codex who analyze product, marketing, or research datasets and want scipy-grade discipline inside the agent.
When should I use data-analysis?
Use it during Validate when sizing markets from public datasets, during Build when profiling backend or integration metrics, and during Grow when interpreting retention, activation, or experiment results before changing the roadmap.
Is data-analysis safe to install?
Review the Security Audits panel on this Prism page and your org policy before running agent skills that read local files; the skill focuses on analysis libraries rather than calling external APIs by default.
SKILL.md
READMESKILL.md - Data Analysis
# Data Analysis Part of [Agent Skills™](https://github.com/itallstartedwithaidea/agent-skills) by [googleadsagent.ai™](https://googleadsagent.ai) ## Description Data Analysis provides a structured framework for statistical analysis using pandas, numpy, and scipy, with visualization through matplotlib, seaborn, and plotly. The agent follows a rigorous pipeline from data ingestion and cleaning through exploratory analysis, statistical testing, and publication-quality visualization, ensuring reproducibility at every step. Scientific data analysis is not exploratory coding—it is a disciplined process where every transformation is justified, every statistical test has verified assumptions, and every visualization accurately represents the underlying data. This skill enforces that discipline by requiring the agent to document data provenance, validate distributions before applying parametric tests, and report effect sizes alongside p-values. The visualization layer produces figures suitable for journal submission: proper axis labels with units, colorblind-safe palettes, appropriate figure sizes for single or double-column layouts, and vector output formats (SVG, PDF). Interactive plotly visualizations are generated for exploratory work; static matplotlib/seaborn figures for publication. ## Use When - Performing statistical analysis on experimental or observational data - Cleaning and transforming datasets for downstream analysis - Creating publication-quality figures and plots - Running hypothesis tests with proper assumption checking - Exploratory data analysis on new datasets - Building reproducible analysis pipelines ## How It Works ```mermaid graph TD A[Raw Data] --> B[Ingest + Validate Schema] B --> C[Clean: Missing Values, Outliers, Types] C --> D[Exploratory Data Analysis] D --> E[Distribution Assessment] E --> F{Parametric Assumptions Met?} F -->|Yes| G[Parametric Tests] F -->|No| H[Non-Parametric Tests] G --> I[Effect Size + Confidence Intervals] H --> I I --> J[Publication Visualization] J --> K[Reproducible Report] ``` The pipeline enforces assumption checking before test selection. Parametric tests (t-test, ANOVA) require normality and homoscedasticity; when assumptions fail, the agent automatically selects non-parametric alternatives (Mann-Whitney, Kruskal-Wallis). ## Implementation ```python import pandas as pd import numpy as np from scipy import stats import seaborn as sns import matplotlib.pyplot as plt def analysis_pipeline(filepath: str) -> dict: df = pd.read_csv(filepath) report = { "shape": df.shape, "missing": df.isnull().sum().to_dict(), "dtypes": df.dtypes.astype(str).to_dict(), } numeric_cols = df.select_dtypes(include=[np.number]).columns for col in numeric_cols: stat, p = stats.shapiro(df[col].dropna()[:5000]) report[f"{col}_normality"] = {"statistic": stat, "p_value": p, "normal": p > 0.05} return report def compare_groups(df: pd.DataFrame, value_col: str, group_col: str) -> dict: groups = [g[value_col].dropna() for _, g in df.groupby(group_col)] normality_ok = all(stats.shapiro(g[:5000]).pvalue > 0.05 for g in groups) _, levene_p = stats.levene(*groups) homoscedastic = levene_p > 0.05 if normality_ok and homoscedastic: stat, p = stats.f_oneway(*groups) if len(groups) > 2 else stats.ttest_ind(*groups) test_name = "ANOVA" if len(groups) > 2 else "t-test" else: stat, p = stats.kruskal(*groups) test_name = "Kruskal-Wallis" effect = compute_cohens_d(groups[0], groups[1]) if len(groups) == 2 else compute_eta_squared(groups) return {"test": test_name, "statistic": stat, "p_value": p, "effect_size": effect} def publ