
Clean Data Xls
Normalize messy spreadsheet columns—trim, dedupe, fix number-as-text and dates—before dashboards, models, or financial analysis in Excel or .xlsx files.
Overview
Clean Data XLS is an agent skill most often used in Grow (also Validate, Build) that profiles and cleans messy spreadsheet ranges—whitespace, casing, types, dates, and duplicates—for analysis-ready data.
Install
npx skills add https://github.com/anthropics/financial-services-plugins --skill clean-data-xlsWhat is this skill?
- Profiles column dominant types (text, number, date) and flags mixed-type outliers before transforming
- Fixes whitespace, inconsistent categorical casing, currency or percent artifacts, and mixed date formats
- Detects duplicates and chooses in-place cleanup vs helper-column formulas under Office JS or openpyxl
- Runs in Excel via Office JS (Excel.run) or on standalone .xlsx files with Python/openpyxl
- Triggered by messy-data phrases: clean this data, dedupe, normalize, standardize this column
- Issue detection matrix covers whitespace, casing, number-as-text, mixed dates, and duplicates
- Dual runtime paths: Excel Office JS and standalone Python/openpyxl
Adoption & trust: 732 installs on skills.sh; 30.5k GitHub stars; 3/3 security scanners passed (skills.sh audits).
What problem does it solve?
Your spreadsheet mixes text numbers, inconsistent dates, and duplicate rows so pivots, imports, and metrics silently lie.
Who is it for?
Founders cleaning revenue, pipeline, or ops exports in Excel or .xlsx before building charts, forecasts, or investor-ready tables.
Skip if: Large warehouse ETL, regulated archival with immutable audit trails only, or datasets that need bespoke ML imputation instead of rule-based hygiene.
When should I use this skill?
Data is messy, inconsistent, or needs prep before analysis—user says clean this data, dedupe, normalize, fix formatting, or standardize this column.
What do I get? / Deliverables
You get a scoped, type-aware cleaned range or helper columns with standardized values ready for analysis, reporting, or export.
- Profiled column types with flagged outliers
- Cleaned values or helper columns addressing whitespace, casing, numeric text, dates, and duplicates
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Cleaning tabular exports is the gate before analytics, reporting, and growth decisions that solo builders run from spreadsheets. Analytics subphase covers turning raw sheets into trustworthy inputs for metrics, cohort views, and ad-hoc analysis.
Where it fits
Sanity-check a messy churn export before you trust assumptions in a pricing or retention model.
Normalize a sponsor-provided xlsx before importing rows into your app’s staging database.
Standardize MRR and date columns so weekly KPI sheets reconcile across tabs.
Re-run duplicate removal and casing fixes on refreshed monthly ops downloads.
How it compares
Use for structured spreadsheet hygiene workflows, not as a full Python pandas pipeline designer or a live database migration tool.
Common Questions / FAQ
Who is clean-data-xls for?
Solo builders and indie operators who analyze business data in Excel or xlsx files—often from finance, sales, or ops plugins—and need reliable columns before formulas or BI.
When should I use clean-data-xls?
In Validate when scoping whether messy exports support your metrics story; in Build when preparing seed datasets for apps or scripts; in Grow when fixing analytics inputs; in Operate when monthly reports arrive corrupted from exports.
Is clean-data-xls safe to install?
The skill can read and rewrite ranges in workbooks you point it at—back up files first. Review the Security Audits panel on this Prism page and avoid running it on sensitive production ledgers without a copy.
SKILL.md
READMESKILL.md - Clean Data Xls
# Clean Data Clean messy data in the active sheet or a specified range. ## Environment - **If running inside Excel (Office Add-in / Office JS):** Use Office JS directly (`Excel.run(async (context) => {...})`). Read via `range.values`, write helper-column formulas via `range.formulas = [["=TRIM(A2)"]]`. The in-place vs helper-column decision still applies. - **If operating on a standalone .xlsx file:** Use Python/openpyxl. ## Workflow ### Step 1: Scope - If a range is given (e.g. `A1:F200`), use it - Otherwise use the full used range of the active sheet - Profile each column: detect its dominant type (text / number / date) and identify outliers ### Step 2: Detect issues | Issue | What to look for | |---|---| | Whitespace | leading/trailing spaces, double spaces | | Casing | inconsistent casing in categorical columns (`usa` / `USA` / `Usa`) | | Number-as-text | numeric values stored as text; stray `$`, `,`, `%` in number cells | | Dates | mixed formats in the same column (`3/8/26`, `2026-03-08`, `March 8 2026`) | | Duplicates | exact-duplicate rows and near-duplicates (case/whitespace differences) | | Blanks | empty cells in otherwise-populated columns | | Mixed types | a column that's 98% numbers but has 3 text entries | | Encoding | mojibake (`é`, `’`), non-printing characters | | Errors | `#REF!`, `#N/A`, `#VALUE!`, `#DIV/0!` | ### Step 3: Propose fixes Show a summary table before changing anything: | Column | Issue | Count | Proposed Fix | |---|---|---|---| ### Step 4: Apply - **Prefer formulas over hardcoded cleaned values** — where the cleaned output can be expressed as a formula (e.g. `=TRIM(A2)`, `=VALUE(SUBSTITUTE(B2,"$",""))`, `=UPPER(C2)`, `=DATEVALUE(D2)`), write the formula in an adjacent helper column rather than computing the result in Python and overwriting the original. This keeps the transformation transparent and auditable. - Only overwrite in place with computed values when the user explicitly asks for it, or when no sensible formula equivalent exists (e.g. encoding/mojibake repair) - For destructive operations (removing duplicates, filling blanks, overwriting originals), confirm with the user first - After each category of fix (whitespace → casing → number conversion → dates → dedup), show the user a sample of what changed and get confirmation before moving to the next category - Report a before/after summary of what changed