
Pandas Pro
Summarize and transform tabular data with pandas GroupBy, named aggregation, pivot tables, and crosstab patterns for reports and product metrics.
Overview
Pandas Pro is an agent skill most often used in Grow (also Build backend, Validate scope) that teaches GroupBy, pivot, and named aggregation patterns in pandas 2.0+.
Install
npx skills add https://github.com/jeffallan/claude-skills --skill pandas-proWhat is this skill?
- GroupBy fundamentals: single and multi-column keys with mean, min, max, count, and reset_index DataFrame output
- pandas 2.0+ named aggregation syntax for readable multi-metric department or cohort rollups
- Dictionary and multi-aggregation patterns across salary, tenure, and headcount-style columns
- Pivot tables, crosstab, and advanced aggregation patterns beyond basic groupby
- Worked examples on realistic employee/department frames for copy-paste adaptation
- Reference covers pandas 2.0+ named aggregation as the preferred API
- Documents GroupBy, pivot tables, crosstab, and advanced aggregation patterns
Adoption & trust: 3.3k installs on skills.sh; 9.7k GitHub stars; 3/3 security scanners passed (skills.sh audits).
What problem does it solve?
You have messy tabular data and need correct group summaries, pivots, and multi-metric aggregations without silent pandas API footguns.
Who is it for?
Indie builders analyzing exports, building lightweight analytics scripts, or embedding pandas summaries in Python backends and data notebooks.
Skip if: Teams that need distributed Spark/DuckDB warehouse SQL only, or greenfield projects with no Python tabular processing.
When should I use this skill?
When implementing or debugging pandas GroupBy, pivot tables, named aggregations, or summary statistics on DataFrames in Python analytics or backend scripts.
What do I get? / Deliverables
After the skill runs, you get copy-ready aggregation code using modern named agg and GroupBy patterns that produce clear summary DataFrames for reports or downstream features.
- GroupBy and named-aggregation code blocks adapted to your schema
- Pivot or crosstab summaries ready for charts or CSV export
- Multi-metric rollup DataFrames with reset index for downstream APIs
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Aggregation and grouping are the backbone of metrics, cohort views, and operational reporting solo builders rely on after they have data flowing. Analytics is the canonical shelf because the reference focuses on summary statistics, pivot structures, and exploration—not shipping application UI or raw ETL infrastructure alone.
Where it fits
Aggregate prototype user survey or waitlist CSVs to decide which segment deserves a MVP.
Implement nightly job rollups with named agg before exposing metrics APIs.
Build department-style cohort reports from billing and usage exports.
How it compares
Use instead of guessing deprecated agg dict syntax when you want pandas 2.0+ named aggregation as the default pattern.
Common Questions / FAQ
Who is pandas-pro for?
It is for solo builders and small teams using Python pandas for rollups, pivots, and exploration in Claude Code, Cursor, Codex, or generic coding agents.
When should I use pandas-pro?
Use it during Validate to sanity-check datasets, during Build when backend jobs aggregate user events, and during Grow when you produce lifecycle or revenue analytics from CSV or warehouse extracts.
Is pandas-pro safe to install?
Check the Security Audits panel on this Prism page; the skill is reference-oriented code generation and does not inherently require secrets, but agents may read local data files you point them at.
SKILL.md
READMESKILL.md - Pandas Pro
# Aggregation and GroupBy --- ## Overview Aggregation transforms data from individual records to summary statistics. This reference covers GroupBy, pivot tables, crosstab, and advanced aggregation patterns with pandas 2.0+. --- ## GroupBy Fundamentals ### Basic GroupBy ```python import pandas as pd import numpy as np df = pd.DataFrame({ 'department': ['Eng', 'Eng', 'Sales', 'Sales', 'Eng', 'HR'], 'team': ['Backend', 'Frontend', 'East', 'West', 'Backend', 'Recruit'], 'employee': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve', 'Frank'], 'salary': [80000, 75000, 65000, 70000, 85000, 60000], 'years': [5, 3, 7, 4, 6, 2] }) # Single column groupby with single aggregation avg_salary = df.groupby('department')['salary'].mean() # Multiple aggregations stats = df.groupby('department')['salary'].agg(['mean', 'min', 'max', 'count']) # GroupBy multiple columns grouped = df.groupby(['department', 'team'])['salary'].mean() # Reset index to get DataFrame instead of Series grouped = df.groupby('department')['salary'].mean().reset_index() ``` ### Multiple Columns, Multiple Aggregations ```python # Named aggregation (pandas 2.0+ preferred) result = df.groupby('department').agg( avg_salary=('salary', 'mean'), max_salary=('salary', 'max'), total_years=('years', 'sum'), headcount=('employee', 'count'), ) # Dictionary syntax (traditional) result = df.groupby('department').agg({ 'salary': ['mean', 'max', 'std'], 'years': ['sum', 'mean'], }) # Flatten multi-level column names result.columns = ['_'.join(col).strip() for col in result.columns.values] ``` ### Custom Aggregation Functions ```python # Lambda functions result = df.groupby('department').agg({ 'salary': lambda x: x.max() - x.min(), # Range 'years': lambda x: x.quantile(0.75), # 75th percentile }) # Named functions for clarity def salary_range(x): return x.max() - x.min() def coefficient_of_variation(x): return x.std() / x.mean() if x.mean() != 0 else 0 result = df.groupby('department').agg( salary_range=('salary', salary_range), salary_cv=('salary', coefficient_of_variation), ) # Multiple custom functions result = df.groupby('department')['salary'].agg([ ('range', lambda x: x.max() - x.min()), ('iqr', lambda x: x.quantile(0.75) - x.quantile(0.25)), ('median', 'median'), ]) ``` --- ## Transform and Apply ### Transform - Returns Same Shape ```python # Transform returns Series with same index as original # Useful for adding aggregated values back to original DataFrame # Add group mean as new column df['dept_avg_salary'] = df.groupby('department')['salary'].transform('mean') # Normalize within group df['salary_zscore'] = df.groupby('department')['salary'].transform( lambda x: (x - x.mean()) / x.std() ) # Rank within group df['salary_rank'] = df.groupby('department')['salary'].transform('rank', ascending=False) # Percentage of group total df['salary_pct'] = df.groupby('department')['salary'].transform( lambda x: x / x.sum() * 100 ) # Fill missing with group mean df['salary'] = df.groupby('department')['salary'].transform( lambda x: x.fillna(x.mean()) ) ``` ### Apply - Flexible Operations ```python # Apply runs function on each group DataFrame def top_n_by_salary(group, n=2): return group.nlargest(n, 'salary') top_earners = df.groupby('department').apply(top_n_by_salary, n=2) # Reset index after apply top_earners = df.groupby('department', group_keys=False).apply( top_n_by_salary, n=2 ).reset_index(drop=True) # Complex group operations def group_summary(group): return pd.Series({ 'headcount': len(group), 'avg_salary': group['salary'].mean(), 'top_earner': group.loc[group['salary'].idxmax(), 'employee'], 'avg_tenure': group['years'].mean(), }) summary = df.groupby('department').apply(group_summary) ``` ### Filter - Keep/Remove Groups ```python # Keep only groups meeting a condition # Groups with average salary > 70000 filte