Pandas Data Analysis

Name: Pandas Data Analysis
Author: pluginagentmarketplace

pluginagentmarketplace/custom-plugin-python

1k installs
5 repo stars
Updated January 5, 2026
pluginagentmarketplace/custom-plugin-python

pandas data analysis is a Python agent skill (v2.1.0) that loads, cleans, aggregates, and visualizes tabular data with Pandas, NumPy, and Matplotlib for developers building analytics features or exploring datasets.

About

pandas data analysis is a Python skill (version 2.1.0, sasmp_version 1.3.0) from pluginagentmarketplace/custom-plugin-python for tabular data work with Pandas, NumPy, and Matplotlib. It guides agents through loading data from CSV, Excel, SQL, and APIs, cleaning messy datasets, transforming and aggregating columns, and producing visualizations for analysis features. Developers reach for pandas data analysis when prototyping analytics endpoints, exploring datasets inside an agent session, or implementing data manipulation logic without hand-rolling every transform. The skill bonds as PRIMARY_BOND to the 03-data-science agent with exponential backoff retries and data_processing_time metrics for observability during processing runs.

Covers six learning areas from DataFrames through large-dataset performance
Documents loc, iloc, boolean indexing, and department-style filter examples
Spans CSV, Excel, SQL, and API data sources in stated objectives
Pairs exploratory analysis with Matplotlib visualization outcomes
Bonded as PRIMARY_BOND to a data-science agent persona in frontmatter

Pandas Data Analysis by the numbers

1,031 all-time installs (skills.sh)
+25 installs in the week ending Jul 28, 2026 (Skillselion tracking)
Ranked #290 of 2,066 Data Science & ML skills by installs in the Skillselion catalog
Security screen: MEDIUM risk (skills.sh audit)
Data as of Jul 28, 2026 (Skillselion catalog sync)

npx skills add https://github.com/pluginagentmarketplace/custom-plugin-python --skill pandas-data-analysis

Add your badge

Show developers this skill is listed on Skillselion. Paste this into your README.

[![Listed on Skillselion](https://skillselion.com/badge/skills/pluginagentmarketplace/custom-plugin-python/pandas-data-analysis.svg)](https://skillselion.com/skills/pluginagentmarketplace/custom-plugin-python/pandas-data-analysis)

Installs	1k
repo stars	★ 5
Security audit	3 / 3 scanners passed
Last updated	January 5, 2026
Repository	pluginagentmarketplace/custom-plugin-python ↗

How do you clean and analyze CSV data with Pandas?

Load, clean, aggregate, and visualize tabular data with Pandas, NumPy, and Matplotlib when building analytics features or exploring datasets.

Who is it for?

Python developers building analytics features who need guided Pandas, NumPy, and Matplotlib workflows for loading, cleaning, and visualizing tabular data.

Skip if: Real-time stream processing at scale, frontend charting-only tasks, or teams that need ClickHouse production deployment instead of notebook-style analysis.

When should I use this skill?

The user loads CSV/Excel/SQL/API data, asks to clean, transform, aggregate, or visualize tabular datasets with Pandas or Matplotlib.

What you get

Cleaned DataFrames, aggregation outputs, Matplotlib visualizations, and analysis-ready tabular datasets from multiple sources.

cleaned DataFrame
aggregation output
Matplotlib chart

By the numbers

Skill version 2.1.0 with sasmp_version 1.3.0
PRIMARY_BOND to 03-data-science bonded agent

Files

SKILL.mdMarkdownGitHub ↗

Pandas Data Analysis

Overview

Master data analysis with Pandas, the powerful Python library for data manipulation and analysis. Learn to clean, transform, analyze, and visualize data effectively.

Learning Objectives

Load and manipulate data from various sources (CSV, Excel, SQL, APIs)
Clean and transform messy datasets
Perform exploratory data analysis (EDA)
Aggregate and group data for insights
Create compelling visualizations
Optimize performance for large datasets

Core Topics

1. Pandas DataFrames & Series

Creating DataFrames from various sources
Indexing and selecting data (loc, iloc, at, iat)
Filtering and boolean indexing
Adding/removing columns and rows
Data types and conversions

Code Example:

import pandas as pd
import numpy as np

# Create DataFrame
data = {
    'name': ['Alice', 'Bob', 'Charlie', 'David'],
    'age': [25, 30, 35, 28],
    'salary': [50000, 60000, 75000, 55000],
    'department': ['IT', 'HR', 'IT', 'Sales']
}
df = pd.DataFrame(data)

# Indexing and filtering
it_employees = df[df['department'] == 'IT']
high_earners = df.loc[df['salary'] > 55000, ['name', 'salary']]

# Adding calculated columns
df['annual_bonus'] = df['salary'] * 0.10
df['age_group'] = pd.cut(df['age'], bins=[0, 30, 40, 100], labels=['Young', 'Mid', 'Senior'])

print(df)

2. Data Cleaning & Transformation

Handling missing data (dropna, fillna, interpolate)
Removing duplicates
String operations and text cleaning
Date/time parsing and manipulation
Type conversions and casting
Applying custom functions (apply, map, applymap)

Code Example:

import pandas as pd

# Load data with missing values
df = pd.read_csv('sales_data.csv')

# Handle missing values
df['price'].fillna(df['price'].median(), inplace=True)
df['category'].fillna('Unknown', inplace=True)
df.dropna(subset=['customer_id'], inplace=True)

# Clean text data
df['product_name'] = df['product_name'].str.strip().str.lower()
df['product_name'] = df['product_name'].str.replace('[^a-zA-Z0-9 ]', '', regex=True)

# Convert dates
df['order_date'] = pd.to_datetime(df['order_date'])
df['year'] = df['order_date'].dt.year
df['month'] = df['order_date'].dt.month

# Remove duplicates
df.drop_duplicates(subset=['order_id'], keep='first', inplace=True)

# Apply custom function
def categorize_price(price):
    if price < 50:
        return 'Low'
    elif price < 100:
        return 'Medium'
    else:
        return 'High'

df['price_category'] = df['price'].apply(categorize_price)

3. Aggregation & Grouping

GroupBy operations
Aggregation functions (sum, mean, count, etc.)
Pivot tables and cross-tabulation
Multi-level indexing
Window functions (rolling, expanding)

Code Example:

import pandas as pd

# Sample sales data
df = pd.read_csv('sales.csv')

# GroupBy aggregation
dept_stats = df.groupby('department').agg({
    'salary': ['mean', 'min', 'max'],
    'employee_id': 'count'
})

# Multiple groupby
sales_by_region_product = df.groupby(['region', 'product_category'])['sales'].sum()

# Pivot table
pivot = df.pivot_table(
    values='sales',
    index='product_category',
    columns='quarter',
    aggfunc='sum',
    fill_value=0
)

# Rolling window (moving average)
df['sales_ma_7d'] = df.groupby('product_id')['sales'].transform(
    lambda x: x.rolling(window=7, min_periods=1).mean()
)

# Cumulative sum
df['cumulative_sales'] = df.groupby('product_id')['sales'].cumsum()

4. Data Visualization

Matplotlib basics
Seaborn for statistical plots
Pandas built-in plotting
Customizing plots
Creating dashboards

Code Example:

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Set style
sns.set_style('whitegrid')

# Load data
df = pd.read_csv('sales_data.csv')

# 1. Line plot - Sales trend over time
df.groupby('month')['sales'].sum().plot(kind='line', figsize=(10, 6))
plt.title('Monthly Sales Trend')
plt.xlabel('Month')
plt.ylabel('Total Sales ($)')
plt.show()

# 2. Bar plot - Sales by category
category_sales = df.groupby('category')['sales'].sum().sort_values(ascending=False)
category_sales.plot(kind='bar', figsize=(10, 6))
plt.title('Sales by Category')
plt.xlabel('Category')
plt.ylabel('Total Sales ($)')
plt.xticks(rotation=45)
plt.show()

# 3. Histogram - Price distribution
df['price'].hist(bins=30, figsize=(10, 6))
plt.title('Price Distribution')
plt.xlabel('Price ($)')
plt.ylabel('Frequency')
plt.show()

# 4. Box plot - Salary by department
df.boxplot(column='salary', by='department', figsize=(10, 6))
plt.title('Salary Distribution by Department')
plt.suptitle('')
plt.show()

# 5. Heatmap - Correlation matrix
corr = df[['age', 'salary', 'years_experience']].corr()
sns.heatmap(corr, annot=True, cmap='coolwarm', center=0)
plt.title('Correlation Matrix')
plt.show()

Hands-On Practice

Project 1: Customer Analytics

Analyze customer purchase behavior and segmentation.

Requirements:

Load customer transaction data
Clean and prepare dataset
Calculate RFM (Recency, Frequency, Monetary) metrics
Customer segmentation
Visualize insights
Generate executive summary

Key Skills: Data cleaning, aggregation, visualization

Project 2: Time Series Analysis

Analyze sales trends and forecast future performance.

Requirements:

Load time series data
Handle missing dates
Calculate moving averages
Identify trends and seasonality
Detect anomalies
Create interactive visualizations

Key Skills: Time series operations, rolling windows, plotting

Project 3: Data Quality Report

Build automated data quality assessment tool.

Requirements:

Check for missing values
Identify duplicates
Detect outliers
Validate data types
Generate quality metrics
Export HTML report

Key Skills: Data validation, statistical analysis, reporting

Assessment Criteria

[ ] Load and clean real-world datasets efficiently
[ ] Perform complex data transformations
[ ] Use GroupBy for aggregations
[ ] Create insightful visualizations
[ ] Handle missing and inconsistent data
[ ] Optimize performance for large datasets
[ ] Document analysis with clear explanations

Resources

Official Documentation

Pandas Docs - Official documentation
NumPy Docs - NumPy documentation
Matplotlib Docs - Plotting library

Learning Platforms

Kaggle - Free Pandas course
DataCamp - Interactive courses
Python for Data Analysis - Wes McKinney's book

Tools

Jupyter Notebook - Interactive development
Google Colab - Cloud notebooks
Anaconda - Data science distribution

Next Steps

After mastering Pandas, explore:

Scikit-learn - Machine learning
SQL - Database querying
Apache Spark - Big data processing
Tableau/Power BI - Business intelligence tools

Related skills

Microsoft FoundryDeploy, evaluate, and continuously improve Microsoft Foundry agents from a single agent interface.478k1.3k

Ai Research ReproductionOrchestrate trustworthy, auditable reproduction of deep learning repositories directly from their READMEs.164k507

Run TrainSafely execute selected deep learning training commands with standardized evidence capture.164k507

Explore RunSafely run isolated exploratory experiments with clear recording and conservative selection before committing changes.164k507

Paper Context ResolverFetch precise reproduction-critical details like dataset splits, preprocessing steps, or evaluation protocols from the original academic paper when the repo README leav141k507

Repo Intake And PlanScan unfamiliar AI research repositories and receive a minimal, trustworthy reproduction target before investing significant time.140k507

How it compares

Pick pandas data analysis for in-session Python tabular ETL and charts; use database deploy skills when moving analytics storage to managed cloud services.

FAQ

Which libraries does pandas data analysis use?

pandas data analysis uses Pandas for manipulation, NumPy for numerical operations, and Matplotlib for visualization. Version 2.1.0 covers loading from CSV, Excel, SQL, and APIs, plus cleaning, transforming, and charting workflows.

What data sources does pandas data analysis support?

pandas data analysis supports CSV, Excel, SQL, and API tabular sources. Developers use it to clean messy datasets, perform aggregations, and visualize results when building analytics features or exploring data in agent sessions.

Is Pandas Data Analysis safe to install?

skills.sh reports 3 of 3 security scanners passed. Review the Security Audits panel on this page before installing in production.

Data Science & MLanalyticspipelines

About

Pandas Data Analysis by the numbers

Add your badge

How do you clean and analyze CSV data with Pandas?

Who is it for?

When should I use this skill?

What you get

By the numbers

Files

Pandas Data Analysis

Overview

Learning Objectives

Core Topics

1. Pandas DataFrames & Series

2. Data Cleaning & Transformation

3. Aggregation & Grouping

4. Data Visualization

Hands-On Practice

Project 1: Customer Analytics

Project 2: Time Series Analysis

Project 3: Data Quality Report

Assessment Criteria

Resources

Official Documentation

Learning Platforms

Tools

Next Steps

pandas-data-analysis Guide

Related skills

How it compares

FAQ

Which libraries does pandas data analysis use?

What data sources does pandas data analysis support?

Is Pandas Data Analysis safe to install?

This week in AI coding