
Data Analyst
Choose and apply missing-value imputation methods when cleaning datasets before analysis or modeling in a solo-built product.
Overview
Data Analyst is an agent skill most often used in Grow (also Validate, Build) that guides missing-value imputation choices and tradeoffs for cleaner analysis datasets.
Install
npx skills add https://github.com/ailabs-393/ai-labs-claude-skills --skill data-analystWhat is this skill?
- Reference for mean, median, mode, and advanced imputation strategies with when-to-use guidance
- Decision framework by data type, missingness pattern, missing rate, and variable relationships
- Per-method tradeoffs: variance distortion, correlation impact, and distribution assumptions
- Example use cases such as temperature readings and height measurements
- Oriented toward maintaining sample size while preserving analysis validity
Adoption & trust: 914 installs on skills.sh; 399 GitHub stars; 3/3 security scanners passed (skills.sh audits).
What problem does it solve?
You have columns with gaps and no clear rule for whether to drop rows, fill with means, or use a stronger method without biasing metrics or models.
Who is it for?
Solo builders cleaning survey, sensor, or product analytics tables with moderate missing rates who want a structured method picker.
Skip if: Teams needing a turnkey Python notebook pipeline, automated ML feature stores, or legal-grade anonymization without reading the reference yourself.
When should I use this skill?
When analyzing or cleaning datasets with missing numeric or categorical fields and you need method selection guidance.
What do I get? / Deliverables
You pick an imputation approach matched to distribution, missing rate, and variable relationships so downstream stats and dashboards stay defensible.
- Documented imputation strategy per column
- Rationale aligned to distribution and missingness pattern
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Analytics and measurement are the canonical home for analyst workflows; the skill content is imputation reference for analysis quality. Imputation directly supports dashboards, metrics, and experiments that solo builders track in the grow phase.
Where it fits
Check whether imputing survey blanks preserves enough signal to demo a pricing hypothesis.
Document imputation rules before coding a small ETL script for user event tables.
Fix incomplete funnel metrics before sharing weekly activation charts with stakeholders.
How it compares
Reference procedural knowledge for imputation decisions—not a hosted notebook, BI connector, or MCP data warehouse tool.
Common Questions / FAQ
Who is data-analyst for?
Indie builders and small teams doing their own metrics, prototypes, or research datasets who need imputation guidance inside an AI coding agent.
When should I use data-analyst?
During validate when checking if a dataset supports a landing-page claim; in build when shaping ETL or features; in grow when fixing analytics gaps before reporting.
Is data-analyst safe to install?
Review the Security Audits panel on this Prism page for install source and any reported risks before adding it to your agent stack.
SKILL.md
READMESKILL.md - Data Analyst
export default async function data_analyst(input) { console.log("🧠 Running skill: data-analyst"); // TODO: implement actual logic for this skill return { message: "Skill 'data-analyst' executed successfully!", input }; } { "name": "@ai-labs-claude-skills/data-analyst", "version": "1.0.0", "description": "Claude AI skill: data-analyst", "main": "index.js", "files": [ "." ], "license": "MIT", "author": "AI Labs" } # Missing Value Imputation Methods Reference This document provides detailed information about various imputation strategies and when to use them. ## Overview Missing data is a common challenge in data analysis. The choice of imputation method significantly impacts analysis quality and should be based on: - The type of data (numeric, categorical, temporal) - The pattern of missingness (random, systematic) - The percentage of missing values - The relationship between variables ## Imputation Methods ### 1. Mean Imputation **Description**: Replace missing values with the arithmetic mean of non-missing values. **Best for**: - Normally distributed numeric data - Low to moderate missing rates (<20%) - Variables without strong relationships to others **Advantages**: - Simple and fast - Maintains sample size - Preserves mean of the distribution **Disadvantages**: - Reduces variance - Distorts correlations - Not suitable for skewed distributions **Example use case**: Imputing missing temperature readings, height measurements, or test scores. --- ### 2. Median Imputation **Description**: Replace missing values with the median of non-missing values. **Best for**: - Skewed numeric distributions - Data with outliers - Ordinal data **Advantages**: - Robust to outliers - Works well with skewed data - Simple to implement **Disadvantages**: - Reduces variance - May not preserve relationships between variables **Example use case**: Imputing income data, house prices, or any right-skewed distribution. --- ### 3. Mode Imputation **Description**: Replace missing values with the most frequent value (mode). **Best for**: - Categorical variables - Binary variables - Low-cardinality discrete variables **Advantages**: - Appropriate for categorical data - Maintains most common pattern - Simple interpretation **Disadvantages**: - May introduce bias if mode is not truly representative - Reduces variability **Example use case**: Imputing product categories, gender, yes/no responses. --- ### 4. Constant Value Imputation **Description**: Replace missing values with a predefined constant (e.g., "Unknown", 0, -999). **Best for**: - High-cardinality categorical variables - When missingness itself is informative - Text fields **Advantages**: - Makes missingness explicit - Useful for categorical analysis - Simple to implement **Disadvantages**: - May create artificial category - Not suitable for numeric analysis without transformation **Example use case**: Imputing missing comments, optional survey fields, or product descriptions. --- ### 5. Forward Fill (LOCF - Last Observation Carried Forward) **Description**: Replace missing values with the last observed value in sequence. **Best for**: - Time series data - Sequential measurements - Slowly changing variables **Advantages**: - Preserves temporal patterns - Logical for continuous processes - Maintains smooth transitions **Disadvantages**: - Assumes stability over time - Can propagate measurement errors - Not suitable for volatile data **Example use case**: Imputing stock prices, sensor readings, or patient vital signs. --- ### 6. Backward Fill (NOCB - Next Observation Carried Backward) **Description**: Replace missing values with the next observed value in sequence. **Best for**: - Time series with forward-looking data - When future values are more relevant **Advantages**: - Useful for certain temporal patterns - Complements forward fill **Disadvantages**: - Less intuitive than forward fill - May not reflect actual pro