Data Analyst

Analytics and measurement are the canonical home for analyst workflows; the skill content is imputation reference for analysis quality. Imputation directly supports dashboards, metrics, and experiments that solo builders track in the grow phase.

Also useful

Also useful

Where it fits

Example use

Check whether imputing survey blanks preserves enough signal to demo a pricing hypothesis.

Example use

Document imputation rules before coding a small ETL script for user event tables.

Example use

Fix incomplete funnel metrics before sharing weekly activation charts with stakeholders.

How it compares

Reference procedural knowledge for imputation decisions—not a hosted notebook, BI connector, or MCP data warehouse tool.

Common Questions / FAQ

Who is data-analyst for?

Indie builders and small teams doing their own metrics, prototypes, or research datasets who need imputation guidance inside an AI coding agent.

When should I use data-analyst?

During validate when checking if a dataset supports a landing-page claim; in build when shaping ETL or features; in grow when fixing analytics gaps before reporting.

Is data-analyst safe to install?

Review the Security Audits panel on this Prism page for install source and any reported risks before adding it to your agent stack.

SKILL.md

READMESKILL.md - Data Analyst

export default async function data_analyst(input) {
  console.log("🧠 Running skill: data-analyst");
  
  // TODO: implement actual logic for this skill
  return {
    message: "Skill 'data-analyst' executed successfully!",
    input
  };
}


{
  "name": "@ai-labs-claude-skills/data-analyst",
  "version": "1.0.0",
  "description": "Claude AI skill: data-analyst",
  "main": "index.js",
  "files": [
    "."
  ],
  "license": "MIT",
  "author": "AI Labs"
}

# Missing Value Imputation Methods Reference

This document provides detailed information about various imputation strategies and when to use them.

## Overview

Missing data is a common challenge in data analysis. The choice of imputation method significantly impacts analysis quality and should be based on:
- The type of data (numeric, categorical, temporal)
- The pattern of missingness (random, systematic)
- The percentage of missing values
- The relationship between variables

## Imputation Methods

### 1. Mean Imputation

**Description**: Replace missing values with the arithmetic mean of non-missing values.

**Best for**:
- Normally distributed numeric data
- Low to moderate missing rates (<20%)
- Variables without strong relationships to others

**Advantages**:
- Simple and fast
- Maintains sample size
- Preserves mean of the distribution

**Disadvantages**:
- Reduces variance
- Distorts correlations
- Not suitable for skewed distributions

**Example use case**: Imputing missing temperature readings, height measurements, or test scores.

---

### 2. Median Imputation

**Description**: Replace missing values with the median of non-missing values.

**Best for**:
- Skewed numeric distributions
- Data with outliers
- Ordinal data

**Advantages**:
- Robust to outliers
- Works well with skewed data
- Simple to implement

**Disadvantages**:
- Reduces variance
- May not preserve relationships between variables

**Example use case**: Imputing income data, house prices, or any right-skewed distribution.

---

### 3. Mode Imputation

**Description**: Replace missing values with the most frequent value (mode).

**Best for**:
- Categorical variables
- Binary variables
- Low-cardinality discrete variables

**Advantages**:
- Appropriate for categorical data
- Maintains most common pattern
- Simple interpretation

**Disadvantages**:
- May introduce bias if mode is not truly representative
- Reduces variability

**Example use case**: Imputing product categories, gender, yes/no responses.

---

### 4. Constant Value Imputation

**Description**: Replace missing values with a predefined constant (e.g., "Unknown", 0, -999).

**Best for**:
- High-cardinality categorical variables
- When missingness itself is informative
- Text fields

**Advantages**:
- Makes missingness explicit
- Useful for categorical analysis
- Simple to implement

**Disadvantages**:
- May create artificial category
- Not suitable for numeric analysis without transformation

**Example use case**: Imputing missing comments, optional survey fields, or product descriptions.

---

### 5. Forward Fill (LOCF - Last Observation Carried Forward)

**Description**: Replace missing values with the last observed value in sequence.

**Best for**:
- Time series data
- Sequential measurements
- Slowly changing variables

**Advantages**:
- Preserves temporal patterns
- Logical for continuous processes
- Maintains smooth transitions

**Disadvantages**:
- Assumes stability over time
- Can propagate measurement errors
- Not suitable for volatile data

**Example use case**: Imputing stock prices, sensor readings, or patient vital signs.

---

### 6. Backward Fill (NOCB - Next Observation Carried Backward)

**Description**: Replace missing values with the next observed value in sequence.

**Best for**:
- Time series with forward-looking data
- When future values are more relevant

**Advantages**:
- Useful for certain temporal patterns
- Complements forward fill

**Disadvantages**:
- Less intuitive than forward fill
- May not reflect actual pro

What is this skill?

Reference for mean, median, mode, and advanced imputation strategies with when-to-use guidance

Decision framework by data type, missingness pattern, missing rate, and variable relationships

Per-method tradeoffs: variance distortion, correlation impact, and distribution assumptions

Example use cases such as temperature readings and height measurements

Oriented toward maintaining sample size while preserving analysis validity

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

Adoption & trust: 914 installs on skills.sh; 399 GitHub stars; 3/3 security scanners passed (skills.sh audits).

Journey fit

Spans multiple journey phases - primary shelf plus alternate fits below.

Primary fit

Also useful

Also useful

Where it fits

Example use

Check whether imputing survey blanks preserves enough signal to demo a pricing hypothesis.

Example use

Document imputation rules before coding a small ETL script for user event tables.

Example use