
Validate Data
Run a structured QA pass on an analysis, SQL, or notebook before you email slides, ship a dashboard, or defend metrics to stakeholders.
Overview
validate-data is an agent skill most often used in Grow (also Validate, Ship) that QA-checks analyses for methodology, accuracy, and bias before stakeholder sharing.
Install
npx skills add https://github.com/anthropics/knowledge-work-plugins --skill validate-dataWhat is this skill?
- Workflow reviews methodology, assumptions, population, and metric definitions
- Supports documents, notebooks, spreadsheets, SQL results, and charts in one pass
- Checks question framing and whether conclusions follow from the data
- Produces a confidence assessment plus concrete improvement suggestions before sharing
Adoption & trust: 1.7k installs on skills.sh; 19.6k GitHub stars; 3/3 security scanners passed (skills.sh audits).
What problem does it solve?
You are about to present numbers or conclusions but have not systematically checked definitions, exclusions, SQL sanity, or unsupported claims.
Who is it for?
Indie operators reviewing SQL dashboards, experiment readouts, or notebook findings before a stakeholder email or launch post.
Skip if: Greenfield ETL builds, raw data collection, or teams that need formal statistical peer review outside agent-assisted checklist QA.
When should I use this skill?
Use when reviewing an analysis before a stakeholder presentation, spot-checking calculations, verifying SQL results, or assessing whether conclusions are supported by the data (/validate-data).
What do I get? / Deliverables
You receive a confidence-style assessment with specific fixes so the analysis is defensible before slides, dashboards, or investor updates go out.
- Confidence assessment
- Methodology and bias findings
- Improvement suggestions list
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Canonical shelf is Grow → analytics because the skill gates insights and metrics you are about to use for decisions, retention narratives, or stakeholder updates—not the initial data pipeline build. It focuses on confidence, bias, and calculation checks on finished analysis artifacts rather than ETL implementation or generic code review.
Where it fits
Stress-test churn definitions before updating a weekly metrics email.
Verify survey counts support the pricing tier story on a waitlist page.
Confirm launch funnel SQL matches the narrative in release notes.
How it compares
Use as a structured analysis reviewer before sharing, not as a substitute for exploratory data analysis or automated test suites on application code.
Common Questions / FAQ
Who is validate-data for?
Solo builders and small teams who produce analytics, SQL, or notebooks and need a consistent last-mile QA step before external communication.
When should I use validate-data?
In Grow → analytics before lifecycle or metric reviews, in Validate → scope when pricing or landing claims depend on data, and in Ship → launch when release KPIs must be spot-checked.
Is validate-data safe to install?
It reviews content you provide; check the Security Audits panel on this Prism page and avoid pasting production secrets or PII you would not put in a normal analytics doc.
SKILL.md
READMESKILL.md - Validate Data
# /validate-data - Validate Analysis Before Sharing > If you see unfamiliar placeholders or need to check which tools are connected, see [CONNECTORS.md](../../CONNECTORS.md). Review an analysis for accuracy, methodology, and potential biases before sharing with stakeholders. Generates a confidence assessment and improvement suggestions. ## Usage ``` /validate-data <analysis to review> ``` The analysis can be: - A document or report in the conversation - A file (markdown, notebook, spreadsheet) - SQL queries and their results - Charts and their underlying data - A description of methodology and findings ## Workflow ### 1. Review Methodology and Assumptions Examine: - **Question framing**: Is the analysis answering the right question? Could the question be interpreted differently? - **Data selection**: Are the right tables/datasets being used? Is the time range appropriate? - **Population definition**: Is the analysis population correctly defined? Are there unintended exclusions? - **Metric definitions**: Are metrics defined clearly and consistently? Do they match how stakeholders understand them? - **Baseline and comparison**: Is the comparison fair? Are time periods, cohort sizes, and contexts comparable? ### 2. Run the Pre-Delivery QA Checklist Work through the checklist below — data quality, calculation, reasonableness, and presentation checks. ### 3. Check for Common Analytical Pitfalls Systematically review against the detailed pitfall catalog below (join explosion, survivorship bias, incomplete period comparison, denominator shifting, average of averages, timezone mismatches, selection bias). ### 4. Verify Calculations and Aggregations Where possible, spot-check: - Recalculate a few key numbers independently - Verify that subtotals sum to totals - Check that percentages sum to 100% (or close to it) where expected - Confirm that YoY/MoM comparisons use the correct base periods - Validate that filters are applied consistently across all metrics Apply the result sanity-checking techniques below (magnitude checks, cross-validation, red-flag detection). ### 5. Assess Visualizations If the analysis includes charts: - Do axes start at appropriate values (zero for bar charts)? - Are scales consistent across comparison charts? - Do chart titles accurately describe what's shown? - Could the visualization mislead a quick reader? - Are there truncated axes, inconsistent intervals, or 3D effects that distort perception? ### 6. Evaluate Narrative and Conclusions Review whether: - Conclusions are supported by the data shown - Alternative explanations are acknowledged - Uncertainty is communicated appropriately - Recommendations follow logically from findings - The level of confidence matches the strength of evidence ### 7. Suggest Improvements Provide specific, actionable suggestions: - Additional analyses that would strengthen the conclusions - Caveats or limitations that should be noted - Better visualizations or framings for key points - Missing context that stakeholders would want ### 8. Generate Confidence Assessment Rate the analysis on a 3-level scale: **Ready to share** -- Analysis is methodologically sound, calculations verified, caveats noted. Minor suggestions for improvement but nothing blocking. **Share with noted caveats** -- Analysis is largely correct but has specific limitations or assumptions that must be communicated to stakeholders. List the required caveats. **Needs revision** -- Found specific errors, methodological issues, or missing analyses that should be addressed before sharing. List the required changes