
Pandas Data Analysis
Load, clean, aggregate, and visualize tabular data with Pandas, NumPy, and Matplotlib when building analytics features or exploring datasets.
Overview
Pandas Data Analysis is an agent skill most often used in Build (also Validate scope and Grow analytics) that teaches Pandas, NumPy, and Matplotlib patterns for cleaning, aggregating, and visualizing tabular data.
Install
npx skills add https://github.com/pluginagentmarketplace/custom-plugin-python --skill pandas-data-analysisWhat is this skill?
- Covers six learning areas from DataFrames through large-dataset performance
- Documents loc, iloc, boolean indexing, and department-style filter examples
- Spans CSV, Excel, SQL, and API data sources in stated objectives
- Pairs exploratory analysis with Matplotlib visualization outcomes
- Bonded as PRIMARY_BOND to a data-science agent persona in frontmatter
- Six core topic areas from DataFrames through large-dataset performance
- Skill version 2.1.0 in frontmatter
Adoption & trust: 866 installs on skills.sh; 5 GitHub stars; 3/3 security scanners passed (skills.sh audits).
What problem does it solve?
You have raw CSV or API data but lack a consistent Pandas workflow for filtering, grouping, and charts your agent can repeat.
Who is it for?
Indie builders adding in-app analytics, internal reports, or data pipelines where Python and Pandas are already on the stack.
Skip if: Pure spreadsheet users with no Python runtime, or teams that only need a one-click BI SaaS with no custom code.
When should I use this skill?
The agent must manipulate, analyze, or visualize structured data with Pandas as part of a data-science bonded workflow.
What do I get? / Deliverables
You produce cleaned DataFrames, exploratory summaries, and Matplotlib visuals using documented indexing and transformation patterns.
- Pandas analysis code for load, clean, aggregate, and plot steps
- Filtered and grouped DataFrames aligned to business questions
- Matplotlib visualizations supporting EDA conclusions
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Build/backend is the canonical shelf because the skill teaches implementation-time manipulation and analysis code, not just pre-build market research. Backend fits DataFrame operations, SQL/API loads, grouping, and performance patterns that ship inside products and scripts.
Where it fits
Profile a sample export to see if pricing or cohort columns support a landing-page claim before full build.
Implement department filters and salary aggregations like the IT employee example for an admin API.
Refresh weekly charts from CRM CSV pulls using the same loc/iloc and groupby patterns.
How it compares
Pedagogical Pandas playbook—not a hosted notebook product or dedicated ETL orchestrator.
Common Questions / FAQ
Who is pandas data analysis for?
Builders and agent operators who analyze structured data in Python and want standardized EDA, cleaning, and visualization steps.
When should I use pandas data analysis?
In Validate when scoping a dataset for a prototype, in Build when implementing backend analytics endpoints or batch jobs, or in Grow when refining lifecycle metrics from exported user data.
Is pandas data analysis safe to install?
The skill is instructional code patterns; execution still reads local data and may call external APIs—check the Security Audits panel on this page before running bonded agents on sensitive tables.
SKILL.md
READMESKILL.md - Pandas Data Analysis
# Pandas Data Analysis ## Overview Master data analysis with Pandas, the powerful Python library for data manipulation and analysis. Learn to clean, transform, analyze, and visualize data effectively. ## Learning Objectives - Load and manipulate data from various sources (CSV, Excel, SQL, APIs) - Clean and transform messy datasets - Perform exploratory data analysis (EDA) - Aggregate and group data for insights - Create compelling visualizations - Optimize performance for large datasets ## Core Topics ### 1. Pandas DataFrames & Series - Creating DataFrames from various sources - Indexing and selecting data (loc, iloc, at, iat) - Filtering and boolean indexing - Adding/removing columns and rows - Data types and conversions **Code Example:** ```python import pandas as pd import numpy as np # Create DataFrame data = { 'name': ['Alice', 'Bob', 'Charlie', 'David'], 'age': [25, 30, 35, 28], 'salary': [50000, 60000, 75000, 55000], 'department': ['IT', 'HR', 'IT', 'Sales'] } df = pd.DataFrame(data) # Indexing and filtering it_employees = df[df['department'] == 'IT'] high_earners = df.loc[df['salary'] > 55000, ['name', 'salary']] # Adding calculated columns df['annual_bonus'] = df['salary'] * 0.10 df['age_group'] = pd.cut(df['age'], bins=[0, 30, 40, 100], labels=['Young', 'Mid', 'Senior']) print(df) ``` ### 2. Data Cleaning & Transformation - Handling missing data (dropna, fillna, interpolate) - Removing duplicates - String operations and text cleaning - Date/time parsing and manipulation - Type conversions and casting - Applying custom functions (apply, map, applymap) **Code Example:** ```python import pandas as pd # Load data with missing values df = pd.read_csv('sales_data.csv') # Handle missing values df['price'].fillna(df['price'].median(), inplace=True) df['category'].fillna('Unknown', inplace=True) df.dropna(subset=['customer_id'], inplace=True) # Clean text data df['product_name'] = df['product_name'].str.strip().str.lower() df['product_name'] = df['product_name'].str.replace('[^a-zA-Z0-9 ]', '', regex=True) # Convert dates df['order_date'] = pd.to_datetime(df['order_date']) df['year'] = df['order_date'].dt.year df['month'] = df['order_date'].dt.month # Remove duplicates df.drop_duplicates(subset=['order_id'], keep='first', inplace=True) # Apply custom function def categorize_price(price): if price < 50: return 'Low' elif price < 100: return 'Medium' else: return 'High' df['price_category'] = df['price'].apply(categorize_price) ``` ### 3. Aggregation & Grouping - GroupBy operations - Aggregation functions (sum, mean, count, etc.) - Pivot tables and cross-tabulation - Multi-level indexing - Window functions (rolling, expanding) **Code Example:** ```python import pandas as pd # Sample sales data df = pd.read_csv('sales.csv') # GroupBy aggregation dept_stats = df.groupby('department').agg({ 'salary': ['mean', 'min', 'max'], 'employee_id': 'count' }) # Multiple groupby sales_by_region_product = df.groupby(['region', 'product_category'])['sales'].sum() # Pivot table pivot = df.pivot_table( values='sales', index='product_category', columns='quarter', aggfunc='sum', fill_value=0 ) # Rolling window (moving average) df['sales_ma_7d'] = df.groupby('product_id')['sales'].transform( lambda x: x.rolling(window=7, min_periods=1).mean() ) # Cumulative sum df['cumulative_sales'] = df.groupby('product_id')['sales'].cumsum() ``` ### 4. Data Visualization - Matplotlib basics - Seaborn for statistical plots - Pandas built-in plotting - Customizing plots - Creating dashboards **Code Exampl