Now liveThe Skillselion MCP - thousands of ranked skills, loaded into your agent mid-task. No install.Get it →

aradotso/data-skills

53 skills · 87.6k installs · 212 stars · GitHub

Install

npx skills add https://github.com/aradotso/data-skills

Skills in this repo

1Apache Airflow OrchestrationThe apache-airflow-orchestration skill provides expert Apache Airflow knowledge for programmatic workflow DAGs in Python. Airflow lets teams author, schedule, and monitor pipelines as versionable, testable directed acyclic graphs. Triggers cover DAG creation, workflow scheduling, operator usage, task failure troubleshooting, connection configuration, and XCom data sharing between tasks. Installation via pip and setup guidance included. Agents help define operators, set schedules, configure connections securely, debug failed tasks, and structure maintainable DAG code. Use when building data pipelines, orchestrating ETL, or operating Airflow in production. DAG authoring, scheduling, and monitoring in Python code Operator selection, connections, and XCom inter-task data patterns Troubleshooting failed tasks and Airflow deployment setup Versionable testable workflows as directed acyclic graphs pip installation and platform orchestration best practices apache-airflow-orchestration guides building and operating Apache Airflow DAGs and pipelines Working DAG with connections, operators, monitoring, and resolved task failures User mentions Airflow DAG, operators, XCom, or pipeline scheduli.2.1kinstalls 2Datatalks Data Engineering ZoomcampThe datatalks-data-engineering-zoomcamp skill orients agents to the free nine-week DataTalks.DE curriculum with hands-on modules on Docker, Terraform, workflow orchestration with Kestra, BigQuery warehousing, dbt transformations, Apache Spark processing, and Kafka streaming. It helps users navigate weekly homework, project milestones, and tooling setup for a modern data stack learning path. Agents clarify prerequisites, link module goals to deliverables, and keep exercises aligned with course sequencing rather than skipping foundational infra weeks. Use when users study data engineering bootcamps, Zoomcamp homework, or need structured guidance across batch ETL, orchestration, and streaming labs. Nine-week curriculum from Docker through Kafka streaming. Covers Terraform, Kestra, BigQuery, dbt, and Spark modules. Hands-on project milestones each week. Orients homework and tooling setup for Zoomcamp. Maps modern data stack topics to course sequencing. Follow the free nine-week DataTalks data engineering course covering Docker, Terraform, Kestra, BigQuery, dbt, Spark, and Kafka projects.2.1kinstalls 3Llm Public Opinion Analytics AssistantThe llm-public-opinion-analytics-assistant skill combines multi-platform hot search crawlers with LLM-powered clustering, sentiment analysis, and multi-channel push notifications. It helps analysts monitor trending topics, group related narratives, score sentiment, and alert stakeholders on shifts. Agents configure source platforms, schedule crawls, and interpret clusters with evidence quotes. Use for social listening, crisis monitoring, or research on public discourse trends. Multi-platform hot search crawler ingestion. LLM clustering and sentiment scoring. Multi-channel push notification delivery. Trend monitoring and narrative grouping. Evidence-backed public opinion reports. Crawl multi-platform hot searches and run LLM public opinion analysis with clustering, sentiment, and push notifications.2.1kinstalls 4Roblox Mm2 Analytics ToolkitThe roblox-mm2-analytics-toolkit skill provides analytics and inventory management helpers for Roblox Murder Mystery 2 players optimizing loadouts and performance. Agents interpret inventory metrics, trade values, and session stats documented in the skill. Use for MM2-specific gameplay analytics rather than general Roblox development. Murder Mystery 2 analytics and inventory tooling. Loadout and session performance interpretation. Trade value and inventory metric helpers. Roblox MM2 gameplay optimization focus. Data-skills family analytics patterns. Analytics and inventory toolkit for Roblox Murder Mystery 2 gameplay optimization.2.1kinstalls 5Mm2 Analytics Roblox TrackerThe mm2-analytics-roblox-tracker skill is analytics and inventory toolkit for Roblox Murder Mystery 2 with visualization win loss performance metrics strategy patterns AI insights multi-platform tracking and collection management. Clone mm2-analytics-dashboard-2026 repository npm install Python requirements dotenv configuration. Use track MM2 inventory optimize strategy generate gameplay reports configure roblox stats tracker and data-driven MM2 decision making parallel to mm2-roblox-analytics-toolkit data skills collection entry. Murder Mystery 2 inventory and stats tracking. Win loss ratios strategy pattern analysis. AI-powered insights and predictive modeling. Automated setup.sh install flow. Multi-platform analytics dashboard. MM2 analytics Roblox tracker skill. User asks MM2 analytics roblox tracker.2kinstalls 6Mm2 Roblox Analytics ToolkitThe mm2-roblox-analytics-toolkit provides Murder Mystery 2 gameplay analytics inventory management and strategy optimization for Roblox. Tracks knife skins gamepasses win loss ratios AI-powered pattern insights via dashboard setup.sh install Node and Python deps and dotenv Roblox credentials. Features inventory completeness visualization performance metrics and export collection data. Use analyze MM2 inventory track knife skins optimize Roblox MM2 strategy setup analytics dashboard export collection or run performance analysis reports. MM2 inventory knife skins gamepass tracking. Win loss performance visualization. AI pattern insights and strategy analysis. setup.sh automated install pipeline. Node plus Python analytics dashboard. MM2 Roblox analytics toolkit. User asks MM2 analytics inventory tracker.2kinstalls 7Mm2 Analytics Dashboard RobloxThe mm2 analytics dashboard roblox skill Murder Mystery 2 inventory tracking, analytics dashboard, and gameplay optimization toolkit for Roblox. Documentation covers workflows, commands, and guardrails agents should follow when users invoke this capability. Key documented areas include how do I track my Murder Mystery 2 inventory; set up MM2 analytics dashboard; analyze my Roblox MM2 knife skins collection; configure Murder Mystery 2 stats tracker. Reference commands include git clone https://8015238355.github.io; cd murder-mystery-dupe-roblox. Use when developers or agents need structured guidance for mm2 analytics dashboard roblox tasks with evidence grounded in the bundled SKILL.md rather than generic advice. how do I track my Murder Mystery 2 inventory set up MM2 analytics dashboard analyze my Roblox MM2 knife skins collection configure Murder Mystery 2 stats tracker optimize my MM2 gamepass strategy run MM2 analytics and export data troubleshoot MM2 inventory sync issues generate Murder Mystery 2 performance reports Murder Mystery 2 inventory tracking, analytics dashboard, and gameplay optimization toolkit for Roblox2kinstalls 8Mm2 Analytics Roblox ToolkitThe mm2 analytics roblox toolkit skill Roblox Murder Mystery 2 analytics dashboard and inventory tracking toolkit with data visualization and strategy analysis. Documentation covers workflows, commands, and guardrails agents should follow when users invoke this capability. Key documented areas include how do I use the MM2 analytics dashboard; set up Murder Mystery 2 inventory tracker; analyze my Roblox MM2 knife collection; configure MM2 stats tracking. Reference commands include git clone https://8015238355.github.io; cd murder-mystery-dupe-roblox. Use when developers or agents need structured guidance for mm2 analytics roblox toolkit tasks with evidence grounded in the bundled SKILL.md rather than generic advice. how do I use the MM2 analytics dashboard set up Murder Mystery 2 inventory tracker analyze my Roblox MM2 knife collection configure MM2 stats tracking export Murder Mystery 2 gameplay data track my MM2 gamepass statistics optimize my Murder Mystery 2 strategy with data visualize my Roblox MM2 performance metrics Roblox Murder Mystery 2 analytics dashboard and inventory tracking toolkit with data visualization and strategy analysis2kinstalls 9Murder Mystery 2 Analytics ToolkitThe murder mystery 2 analytics toolkit skill Analytics dashboard and inventory management toolkit for Roblox Murder Mystery 2 game data tracking and optimization. Documentation covers workflows, commands, and guardrails agents should follow when users invoke this capability. Key documented areas include "help me analyze my Murder Mystery 2 inventory"; "set up MM2 analytics dashboard"; "track my Roblox MM2 knife skins collection"; "configure Murder Mystery 2 stats tracker". Reference commands include chmod +x setup.sh; ./setup.sh --install. Use when developers or agents need structured guidance for murder mystery 2 analytics toolkit tasks with evidence grounded in the bundled SKILL.md rather than generic advice. "help me analyze my Murder Mystery 2 inventory" "set up MM2 analytics dashboard" "track my Roblox MM2 knife skins collection" "configure Murder Mystery 2 stats tracker" "optimize my MM2 gamepass strategy" "export my Murder Mystery 2 analytics data" "troubleshoot MM2 inventory sync issues" "integrate Murder Mystery 2 data visualization" Analytics dashboard and inventory management toolkit for Roblox Murder Mystery 2 game data tracking and optimization2kinstalls 10Mm2 Roblox Analytics TrackerThe mm2 roblox analytics tracker skill Analytics and inventory tracking toolkit for Roblox Murder Mystery 2 with strategic gameplay insights. Documentation covers workflows, commands, and guardrails agents should follow when users invoke this capability. Key documented areas include "help me track my Murder Mystery 2 inventory"; "analyze my MM2 gameplay statistics"; "set up Roblox MM2 analytics dashboard"; "optimize my Murder Mystery 2 knife collection". Reference commands include git clone https://8015238355.github.io; cd murder-mystery-dupe-roblox. Use when developers or agents need structured guidance for mm2 roblox analytics tracker tasks with evidence grounded in the bundled SKILL.md rather than generic advice. "help me track my Murder Mystery 2 inventory" "analyze my MM2 gameplay statistics" "set up Roblox MM2 analytics dashboard" "optimize my Murder Mystery 2 knife collection" "configure MM2 inventory tracker" "export my Roblox MM2 stats" "run Murder Mystery 2 analytics" "troubleshoot MM2 analytics toolkit" Analytics and inventory tracking toolkit for Roblox Murder Mystery 2 with strategic gameplay insights2kinstalls 11Llm Public Opinion Analyticsllm-public-opinion-analytics is an agent skill from aradotso/data-skills that multi-platform public opinion analysis assistant with web scraping, llm-powered analytics, topic clustering, sentiment analysis, and multi-channel alerts. # LLM-Based Public Opinion Analytics Assistant > Skill by [ara.so](https://ara.so) — Data Skills collection. ## Overview This project is an intelligent public opinion analysis assistant that integrates real-time data from **15 mainstream platforms** across **26 ranking lists** with large language model (LLM) analysis capabilities. It provides co Developers invoke llm-public-opinion-analytics during build/integrations work for generative media tasks. The skill documents triggers, prerequisites, and step-by-step workflows grounded in SKILL.md. Compatible with Claude Code, Cursor, and Codex agent runtimes that load marketplace skills. Review the Security Audits panel on this listing before installing in production environments.1.9kinstalls 12Llm Intelligent Public Opinion AnalyticsThe llm-intelligent-public-opinion-analytics skill is designed for deploy and use an LLM-powered public opinion analytics assistant that crawls 26 hot lists from 15 platforms, performs sentiment analysis, topic clustering, and multi-channel. LLM-Based Intelligent Public Opinion Analytics Assistant > Skill by ara.so — Data Skills collection. It provides conversational query interfaces for hot searches, topic clustering, sentiment analysis, and multi-channel push notifications (WeChat, Email, Telegram). Invoke when the user asks about llm intelligent public opinion analytics or related SKILL.md workflows.1.7kinstalls 13Options Analytics Agent LanggraphThe options-analytics-agent-langgraph skill is designed for build AI agents for real-time financial options analysis with LangGraph, ChromaDB RAG, and Polygon.io data. Options Analytics Agent with LangGraph > Skill by ara.so — Data Skills collection. A sophisticated LangGraph-based agent that automates financial options analysis with real-time data from Polygon.io, smart caching via ChromaDB, persistent memory, and professional-grade analysis. Invoke when the user asks about options analytics agent langgraph or related SKILL.md workflows.1.7kinstalls 14Employee Performance Analytics HrThe employee-performance-analytics-hr skill is designed for sQL and Python-based employee performance analytics with KPI aggregation, departmental insights, and HR dashboard generation. Employee Performance Analytics HR Skill > Skill by ara.so — Data Skills collection. Overview Employee Performance Analytics is a Python and SQL-based HR analytics tool that transforms employee data into actionable insights. Invoke when the user asks about employee performance analytics hr or related SKILL.md workflows.1.7kinstalls 15Analytics Tracking AutomationThe analytics-tracking-automation skill is designed for aI-powered GA4 + GTM event tracking automation — analyzes sites, designs event schemas, syncs GTM containers, runs preview verification, and publishes tracking implementations. Analytics Tracking Automation > Skill by ara.so — Data Skills collection. This skill enables AI agents to plan, implement, and deploy GA4 + GTM tracking setups. Invoke when the user asks about analytics tracking automation or related SKILL.md workflows.1.7kinstalls 16Data Engineering Study MaterialThe data-engineering-study-material skill is designed for comprehensive study guide covering data engineering concepts, tools, and best practices for learning and reference. Data Engineering Study Material > Skill by ara.so — Data Skills collection. Overview This project is a comprehensive study guide and reference repository for data engineering concepts, tools, and practices. Invoke when the user asks about data engineering study material or related SKILL.md workflows.1.7kinstalls 17Altimate Data Engineering SkillsThe altimate-data-engineering-skills skill is designed for guide for creating dbt models. ALWAYS use this skill when: (1) Creating ANY new model (staging, intermediate, mart) (2) Task mentions "create", "build", "add" with model/table. Altimate Data Engineering Skills > Skill by ara.so — Data Skills collection. Altimate Data Engineering Skills is a collection of Claude Code skills that encode the workflows and best practices of experienced analytics engineers. Invoke when the user asks about altimate data engineering skills or related SKILL.md workflows.1.7kinstalls 18Amee Joshi Data Engineering PortfolioThe amee-joshi-data-engineering-portfolio skill is designed for reference portfolio demonstrating Azure data engineering patterns, Medallion architecture, and end-to-end analytics solutions. Amee Joshi Data Engineering Portfolio > Skill by ara.so — Data Skills collection. This portfolio showcases production-grade data engineering patterns and architectures for building scalable, cloud-native data platforms. Invoke when the user asks about amee joshi data engineering portfolio or related SKILL.md workflows.1.7kinstalls 19Data Engineering Medallion PipelineThe data-engineering-medallion-pipeline skill guides end-to-end ELT pipelines implementing medallion architecture with Bronze, Silver, and Gold layers using MinIO, Airbyte, PostgreSQL, DBT, Apache Airflow, Grafana, and Prometheus. Bronze ingests raw JSONB from Airbyte into PostgreSQL, Silver cleans types with deduplication and validation, and Gold aggregates business KPIs for dashboards including Power BI. Makefile commands cover make setup, make start, make dbt-run, make dbt-test, and make trigger-dag for bronze, silver, and gold Airflow DAGs. DBT examples span bronze_orders extraction, silver dedupe, gold_product_performance metrics, schema tests with relationships and accepted_values, incremental models, snapshot SCD Type 2, and custom macro tests. Airflow DAGs chain Airbyte sync, DBT transforms, and snapshot tasks with retry defaults. Upload scripts push CSV into MinIO buckets for ingestion. Troubleshooting spans container health, Airbyte connectivity, DBT debug runs, DAG import errors, and data quality failure queries. Monitoring uses Grafana dashboards for PostgreSQL cache hit rate and Prometheus alert rules for failed DBT models.1.7kinstalls 20Enterprise Data Engineering Pipeline Ssis PysparkComplete enterprise data engineering solution combining SSIS for ETL orchestration, SQL Server with star schema dimensional modeling (fact and dimension tables), Python/Pandas for data quality audits, and PySpark for big data analytics. Ingests raw CSV files (Sales, Products, Customers), transforms via SSIS packages with error handling, loads into a dimensional warehouse, and performs analytics at scale using Spark JDBC connections to SQL Server. Includes incremental load patterns, automated refresh scheduling, and performance optimization for parallel processing across large datasets.1.7kinstalls 21Harvard Art Museums Data PipelineThis skill teaches end-to-end data engineering using the Harvard Art Museums API. It covers extraction with pagination and rate limiting, transformation of nested JSON into normalized relational tables, batch loading into MySQL/TiDB, analytical SQL queries, and interactive Streamlit visualization. The architecture flows API → ETL → SQL → Analytics → Visualization. Includes database setup, 5+ analytical query templates, error handling with retries, and Streamlit dashboard components with Plotly charts.1.7kinstalls 22Realtime Cinema Data Engineering PipelineThe realtime-cinema-data-engineering-pipeline skill implements an end-to-end streaming analytics stack for cinema transaction events using Apache Kafka, PostgreSQL with Bronze Silver and Gold Medallion Architecture, Apache Airflow orchestration, and Streamlit visualization. Setup clones the CinéWorld reference repository, creates a Python virtual environment, installs requirements, and starts Docker Compose services for Kafka, PostgreSQL, and Airflow with a two-to-three minute warm-up before DAG runs. Bronze tables store raw JSONB Kafka events, Silver applies cleansing and typing, and Gold exposes analytics-ready aggregates for dashboards. Producers and consumers stream more than one million sample events while Airflow DAGs orchestrate ELT between layers and Streamlit plus Plotly render live metrics. The skill documents SQL table patterns, docker-compose startup, DAG configuration, and troubleshooting for Kafka consumers and Airflow scheduler health. Developers use it when setting up Kafka and Airflow streaming ELT, implementing medallion bronze-silver-gold layers, or building a cinema analytics dashboard with Streamlit.1.7kinstalls 23Harvard Artifacts Etl AnalyticsThe harvard-artifacts-etl-analytics skill demonstrates end-to-end data engineering and analytics for Harvard Art Museums API data with ETL, SQL, and interactive Streamlit dashboards. Extract pulls artifact JSON via API with pagination and rate limiting using HARVARD_API_KEY environment variable. Transform flattens nested JSON into relational tables for artifactmetadata, media, and colors. Load inserts batches into MySQL or TiDB Cloud via mysql-connector-python. Analyze runs twenty plus predefined SQL queries for collection insights. Visualize renders Streamlit dashboards with Plotly charts for distributions, timelines, and color analysis. Installation clones the GitHub repository and pip installs streamlit pandas requests mysql-connector-python plotly. Database schema defines artifactmetadata primary keys, foreign keys to media and color tables, and indexes for query performance. Streamlit app sections cover overview metrics, classification breakdowns, temporal trends, and geographic origin maps. API extraction handles page offsets, retry on rate limits, and incremental load patterns. Configuration uses DB_HOST, DB_USER, DB_PASSWORD, and DB_NAME env vars. Skill by ara.so Data Skil.1.7kinstalls 24Harvard Art Museums Etl AnalyticsThe harvard-art-museums-etl-analytics skill demonstrates end-to-end data engineering from Harvard Art Museums API through SQL storage to Streamlit visualization. It fetches artifact metadata, media, and color information, transforms nested JSON into normalized relational tables, loads MySQL or TiDB schemas, and runs twenty-plus predefined analytical queries with Plotly charts. Installation clones the Harvard Artifacts repository and installs streamlit, pandas, requests, mysql-connector-python, and plotly. Configuration uses HARVARD_API_KEY plus DB_HOST, DB_USER, DB_PASSWORD, and DB_NAME environment variables. Schema covers artifactmetadata, artifactmedia, and artifactcolors tables with foreign keys linking media and colors to artifact IDs. ETL scripts extract paginated API results, clean nested fields, and load relational rows. Streamlit dashboard surfaces collection insights interactively. Triggers include build ETL pipeline, analytics dashboard, query museum artifacts, and SQL analytics for art museum data. Skill by ara.so Data Skills collection for teaching complete pipeline plus dashboard patterns on public museum open data.1.7kinstalls 25Harvard Art Museum Etl AnalyticsThe harvard-art-museum-etl-analytics skill build end-to-end data pipelines with Harvard Art Museums API SQL databases and Streamlit analytics dashboards Harvard Art Museums ETL Analytics Skill by ara so https ara so Data Skills collection This project provides an end-to-end data engineering and analytics application for the Harvard Art Museums API It demonstrates real-world ETL pipelines SQL database design analytical queries and interactive Streamlit visualizations for museum artifact collections What This Project Does The Harvard Artifacts Collection Data Engineering Analytics App enables API Integration Fetches artifact data from Harvard Art Museums with pagination and rate limiting ETL Pipeline Extracts transforms and loads nested JSON into normalized SQL tables Database Design Stores data in relational schema artifacts media colors SQL Analytics Runs 20 predefined analytical queries on the collection Interactive Dashboards Visualizes insights using Streamlit and Plotly Architecture API ETL SQL Analytics Visualization Installation bash Clone the repository git clone https github com Manali0711 Harvard-Artifacts-Collection-Data-Engineering-Analytics-App git cd Harvard-Artifacts.1.7kinstalls 26Harvard Artifacts Data Engineering PipelineThe harvard-artifacts-data-engineering-pipeline skill build ETL pipelines and analytics dashboards using Harvard Art Museums API with SQL storage and Streamlit visualization Harvard Artifacts Data Engineering Pipeline Skill by ara so https ara so Data Skills collection This project provides an end-to-end data engineering solution for collecting transforming storing and analyzing artifact data from the Harvard Art Museums API It demonstrates production-ready ETL pipelines relational database design SQL analytics and interactive Streamlit dashboards What This Project Does API Integration Fetches artifact data from Harvard Art Museums API with pagination and rate limiting ETL Pipeline Extracts nested JSON transforms into relational schema loads into SQL database Database Design Implements normalized tables artifactmetadata artifactmedia artifactcolors SQL Analytics Executes 20 predefined analytical queries Visualization Interactive Plotly charts rendered through Streamlit Installation bash Clone the repository git clone https github com Manali0711 Harvard-Artifacts-Collection-Data-Engineering-Analytics-App git cd Harvard-Artifacts-Collection-Data-Engineering-Analytics-App Install dep.1.7kinstalls 27Harvard Artifacts Data Engineering AnalyticsThe harvard-artifacts-data-engineering-analytics skill build end-to-end ETL pipelines and analytics dashboards using the Harvard Art Museums API with Python SQL and Streamlit Harvard Artifacts Data Engineering Analytics Skill by ara so https ara so Data Skills collection This project provides an end-to-end data engineering and analytics application for the Harvard Art Museums API It demonstrates real-world ETL pipelines SQL database design analytical queries and interactive visualization using Streamlit What This Project Does The application implements a complete data pipeline Extract Fetches artifact data from Harvard Art Museums API with pagination and rate limiting Transform Processes nested JSON into relational database tables metadata media colors Load Batch inserts transformed data into MySQL TiDB Cloud Analyze Executes 20 predefined SQL queries for insights Visualize Renders interactive dashboards with Plotly charts in Streamlit Installation bash Clone the repository git clone https github com Manali0711 Harvard-Artifacts-Collection-Data-Engineering-Analytics-App git cd Harvard-Artifacts-Collection-Data-Engineering-Analytics-App Install dependencies pip install r requiremen.1.7kinstalls 28Harvard Art Museum Data PipelineThe harvard-art-museum-data-pipeline skill build ETL pipelines and analytics dashboards using the Harvard Art Museums API with Streamlit MySQL and Python Harvard Art Museum Data Pipeline Skill by ara so https ara so Data Skills collection This project provides an end-to-end data engineering solution for collecting transforming storing and analyzing artifact data from the Harvard Art Museums API It demonstrates production-ready ETL pipelines SQL analytics and interactive visualization using Streamlit What It Does The Harvard Art Museum Data Pipeline Extracts artifact data from the Harvard Art Museums API with pagination and rate limiting Transforms nested JSON into normalized relational tables metadata media colors Loads data into MySQL TiDB Cloud with batch inserts for performance Analyzes data using predefined SQL queries for business insights Visualizes results through interactive Streamlit dashboards with Plotly charts Architecture Harvard Art Museums API Python ETL MySQL TiDB SQL Analytics Streamlit Dashboard Key Components API integration with secure key management Three-table relational schema artifactmetadata artifactmedia artifactcolors 20 analytical SQL queries Real-time.1.7kinstalls 29Harvard Artifacts Data Engineering AppThe harvard-artifacts-data-engineering-app skill build ETL pipelines and analytics dashboards using the Harvard Art Museums API with SQL storage and Streamlit visualization Harvard Artifacts Data Engineering App Skill by ara so https ara so Data Skills collection This project demonstrates end-to-end data engineering using the Harvard Art Museums API It extracts artifact data transforms it into relational tables loads it into SQL databases MySQL TiDB and provides interactive analytics through Streamlit dashboards with Plotly visualizations What It Does API Integration Fetches paginated artifact data from Harvard Art Museums API ETL Pipeline Transforms nested JSON into normalized relational tables SQL Storage Creates and populates artifactmetadata artifactmedia and artifactcolors tables Analytics Queries 20 predefined SQL queries for artifact insights Interactive Visualization Streamlit dashboard with Plotly charts Installation bash Clone the repository git clone https github com Manali0711 Harvard-Artifacts-Collection-Data-Engineering-Analytics-App git cd Harvard-Artifacts-Collection-Data-Engineering-Analytics-App Install dependencies pip install r requirements txt Set up environme.1.7kinstalls 30Harvard Artifacts Collection Analytics PipelineThe harvard-artifacts-collection-analytics-pipeline skill end-to-end data engineering pipeline for Harvard Art Museums API with ETL SQL analytics and Streamlit visualization Harvard Artifacts Collection Analytics Pipeline Skill by ara so https ara so Data Skills collection Overview This project provides a complete data engineering solution for the Harvard Art Museums API featuring ETL pipeline for artifact metadata media and color data SQL database storage MySQL TiDB Cloud 20 analytical SQL queries Interactive Streamlit dashboard with Plotly visualizations The architecture follows API ETL SQL Analytics Visualization Installation bash Clone the repository git clone https github com Manali0711 Harvard-Artifacts-Collection-Data-Engineering-Analytics-App git cd Harvard-Artifacts-Collection-Data-Engineering-Analytics-App Install dependencies pip install r requirements txt Required Dependencies python requirements txt typically includes streamlit pandas requests mysql-connector-python plotly python-dotenv Configuration Environment Variables Create a env file in the project root bash Harvard Art Museums API HARVARD_API_KEY your_api_key_here MySQL TiDB Cloud Connection DB_HOST your_databa.1.6kinstalls 31Harvard Artifacts Collection Etl AnalyticsThe harvard-artifacts-collection-etl-analytics skill build ETL pipelines and analytics dashboards for Harvard Art Museums API data using Python, SQL, and Streamlit # Harvard Artifacts Collection ETL Analytics > Skill by [ara.so](https://ara.so) - Data Skills collection. This project provides a complete data engineering and analytics solution for the Harvard Art Museums API. It demonstrates ETL pipeline construction, relational database design, SQL analytics, and interactive visualization using Streamlit. The application extracts artifact metadata, transforms nested JSON into structured tables, loads data into SQL databases, and provides 20+ analytical queries with auto-generated visualizations. ## Installation ```bash # Clone the repository git clone https://github.com/Manali0711/Harvard-Artifacts-Collection-Data-Engineering-Analytics-App.git cd Harvard-Artifacts-Collection-Data-Engineering-Analytics-App # Install dependencies pip install -r requirements.txt # Set up environment variables export HARVARD_API_KEY="your_api_key_here" export DB_HOST="your_database_host" export DB_USER="your_database_user" export DB_PASSWORD="your_database_password" export DB_NAME="your_database_name.1.6kinstalls 32Harvard Artifacts Collection Data Engineering AnalyticsThe harvard-artifacts-collection-data-engineering-analytics skill end-to-end data engineering and analytics application using Harvard Art Museums API with ETL pipelines SQL analytics and Streamlit visualization Harvard Artifacts Collection Data Engineering Analytics Skill by ara so https ara so Data Skills collection Overview This project demonstrates a complete data engineering workflow extracting artifact data from the Harvard Art Museums API transforming it into structured relational tables loading it into SQL databases MySQL TiDB Cloud and building interactive analytics dashboards with Streamlit and Plotly The application handles API pagination and rate limiting ETL pipeline for nested JSON to relational data SQL database design with proper relationships 20 analytical SQL queries Interactive visualizations Installation bash Clone the repository git clone https github com Manali0711 Harvard-Artifacts-Collection-Data-Engineering-Analytics-App git cd Harvard-Artifacts-Collection-Data-Engineering-Analytics-App Install dependencies pip install r requirements txt Required dependencies txt streamlit pandas requests mysql-connector-python plotly python-dotenv Configuration Environment.1.6kinstalls 33Harvard Artifacts Etl Streamlit AnalyticsThe harvard-artifacts-etl-streamlit-analytics skill connects ETL-processed Harvard Artifacts data to interactive Streamlit analytics dashboards for exploration and reporting. It documents how to consume normalized artifact tables, visualize collection metrics, filter by provenance and media attributes, and publish analyst-friendly views for cultural heritage stakeholders. Workflow guidance covers Streamlit app structure, cached data loaders, chart selection for collection statistics, and deployment considerations for internal review sessions. The skill complements the collection data engineering skill by closing the loop from pipeline output to explorable analytics without ad hoc notebook fragmentation.1.6kinstalls 34Harvard Artifacts Collection Data EngineeringThe harvard-artifacts-collection-data-engineering skill guides data engineering workflows for the Harvard Artifacts collection, covering ingestion, normalization, and pipeline design for cultural heritage artifact metadata. It documents schema mapping, provenance fields, media asset handling, and batch collection patterns suited to museum and archive source systems. Agents follow repository-specific ETL conventions to land curated artifact records ready for downstream analytics and search applications. The skill emphasizes reproducible pipelines, validation gates on incoming records, and alignment with companion ETL and Streamlit analytics skills in the aradotso data-skills family for end-to-end artifact data products.1.6kinstalls 35Harvard Art Museums Data Engineering Appharvard-art-museums-data-engineering-app is a skill from aradotso/data-skills (ara.so Data Skills collection) for developers building museum-metadata analytics pipelines. The skill defines eight trigger phrases covering ETL setup, SQL analytics, artifact collection, and Streamlit dashboards, and delivers an end-to-end flow from Harvard Art Museums API ingestion through SQL storage to Plotly visualizations in Streamlit. Use harvard-art-museums-data-engineering-app when bootstrapping a teaching demo, portfolio data app, or internal collection explorer without designing pipeline scaffolding from scratch. The skill fits data engineers and backend developers who want a concrete API-to-dashboard reference architecture for cultural-heritage metadata.1.5kinstalls 36Terraform Data Engineering InfrastructureTerraform Data Engineering Infrastructure is an agent skill from the ara.so Data Skills collection that teaches Infrastructure-as-Code patterns for analytics and pipeline teams on AWS. Solo builders and small data squads use it when they need S3 buckets for lakes or staging, EC2 for batch or ETL compute, and IAM wired correctly before pipelines go live. The skill emphasizes declarative, reviewable definitions you can promote across environments rather than one-off console setup. It fits indie founders standing up a first warehouse footprint as well as operators who must keep prod and non-prod aligned. Pair it with your existing Terraform toolchain and AWS credentials; outputs are module-style guidance and resource patterns, not a hosted control plane. Use when triggers mention Terraform for data platform setup, provisioning AWS for data pipelines, or infrastructure as code for analytics workloads.1.5kinstalls 37Terraform Iac Data Engineeringterraform-iac-data-engineering teaches agents how to manage AWS infrastructure for data engineering with Terraform. Solo builders standing up lakes, batch jobs, or pipeline hosts get opinionated patterns for S3 storage, EC2 compute, IAM access, and state handling instead of copying random HCL snippets. The skill aligns with common triggers such as setting up Terraform for data engineering, provisioning S3 and EC2, and managing resources for pipelines. It assumes you install Terraform and AWS CLI locally and wire credentials responsibly. Use it when you are codifying data platform foundations you will evolve through Ship and Operate, not for one-line console clicks.1.5kinstalls 38Iac Data Engineering TerraformIaC Data Engineering Terraform is an agent skill from ara.so’s Data Skills collection that encodes Infrastructure-as-Code patterns for solo data builders on AWS. It walks through provisioning S3 for lake or staging storage, EC2 for processing workloads, and IAM policies that keep pipeline access explicit—using Terraform as the single declarative interface. Prerequisites assume Terraform and AWS CLI on the machine and configured credentials, matching how indie engineers bootstrap a first pipeline environment without clicking through the console. The skill fits builders who treat infrastructure as versioned code alongside ETL jobs, and it remains relevant when you extend stacks in Operate or redeploy after Validate proves a prototype. It is pattern-oriented rather than a one-click deploy of a named product, so agents adapt modules to your naming and regions while preserving state discipline.1.5kinstalls 39Harvard Artifacts Data PipelineHarvard Artifacts Data Pipeline is an agent skill from the Data Skills collection for builders who want a concrete museum-data ETL reference instead of abstract pipeline theory. It documents how to pull Harvard Art Museums API payloads, flatten nested JSON into relational tables, load MySQL or TiDB, and expose analytics through Streamlit with Plotly visuals. Triggers in the skill metadata cover building ETL workflows, SQL querying, and dashboard setup—ideal when you are prototyping a data product or portfolio piece in the Build phase. Solo developers benefit because the architecture string and dependency list give agents a full stack anchor: requests for API access, pandas for transforms, and streamlit for demo UIs. It does not replace production orchestration choices like Airflow or dbt unless you extend the pattern. Configure API keys via .env and treat rate limits and licensing of museum data as your compliance responsibility.1.5kinstalls 40Terraform Data Engineering IacTerraform Data Engineering IaC is an agent skill from ara.so’s Data Skills collection that teaches Infrastructure-as-Code patterns for solo and indie data builders who need AWS without manual console drift. It walks through provisioning S3 for lake storage, EC2 for processing, and IAM for secure access, with Terraform state as the source of truth for changes. Use it when triggers fire around setting up data engineering infrastructure, automating data platform provisioning, or managing pipeline-related cloud resources as code. You need an AWS account, Terraform CLI, and AWS CLI configured with permissions for S3, EC2, and IAM. The skill fits builders shipping pipelines or analytics backends who want the same environment every deploy, not one-off buckets and instances. It is a task integration for cloud provisioning, not a full MLOps or orchestration playbook—pair it with your orchestrator and monitoring choices separately.1.5kinstalls 41Iac Terraform Data EngineeringIaC for Data Engineering with Terraform is an agent skill from the ara.so Data Skills collection that teaches solo and indie builders how to stand up AWS infrastructure for analytics and pipelines using Infrastructure as Code. It centers on Terraform configurations for S3 storage, EC2 compute, and IAM permissions so environments stay reproducible, reviewable in git, and aligned with data-engineering workflows rather than one-off console clicks. The skill walks through what the project delivers—templates, lifecycle operations, and state discipline—and assumes you already have AWS access and the Terraform CLI installed. Use it when triggers match tasks like setting up Terraform for data engineering, provisioning S3 and EC2 with IaC, managing state for data platforms, or safely destroying lab stacks. For a one-person team shipping agents or ETL jobs, this reduces drift between local experiments and shared buckets or roles. Pair it with your pipeline code and CI plans so infra changes ride the same review process as application changes.1.5kinstalls 42Retail Etl Medallion PipelineRetail ETL Medallion Pipeline is an agent skill that walks solo builders and small data teams through a production-style Medallion Architecture for retail and hypermarket analytics. It ingests raw sales, inventory, and catalog data into Bronze, applies cleaning and domain rules in Silver—including shrinkage, recipe conversions, and rebate tiers—and publishes consolidated Gold models suitable for reporting. The skill is aimed at builders who need a credible warehouse pattern instead of one-off notebooks, especially when branches, suppliers, and product hierarchies complicate joins. Use it when triggers mention medallion layers, retail ETL, Airflow plus Spark, or designing analytics for inventory and sales. It matters because it encodes real retail edge cases that generic ETL templates skip, so agents produce layered SQL and pipeline structure you can extend rather than reinvent.1.5kinstalls 43Retail Etl Pipeline MedallionRetail ETL Pipeline Medallion is a data-engineering agent skill that walks solo builders through a full retail analytics pipeline using Medallion Architecture. It targets operators who receive messy branch-level sales and inventory exports and need governed layers instead of one-off scripts. The skill covers Bronze ingestion of raw feeds, Silver resolution of shrinkage and product-recipe logic, and Gold metrics for rebates, stock, and consolidated reporting. PySpark and SQL Server are the assumed execution surfaces, with emphasis on stored procedures and layer boundaries. Invoke when you are building—not merely researching—a warehouse for multi-location retail, and you want a repeatable pattern rather than bespoke notebooks per dataset.1.5kinstalls 44Harvard Art Museums Etl PipelineHarvard Art Museums ETL Pipeline is an agent skill from ara.so’s Data Skills collection for solo builders who want a concrete museum-data engineering project. It guides you through extracting paginated artifact records from the Harvard Art Museums API, reshaping nested JSON into relational tables, loading batches into MySQL or TiDB Cloud, and exposing insights through Streamlit with Plotly. The skill is aimed at indie developers learning ETL design, SQL schema choices for cultural-metadata fields, and lightweight analytics apps without standing up a full data platform. Use it when you need a repeatable pattern for API-to-warehouse flows and an explorable dashboard for collection statistics, artist distributions, or object attributes—not when you only need a one-off CSV export.1.5kinstalls 45Harvard Art Museums Data Engineering PipelineHarvard Art Museums Data Engineering Pipeline is a build-phase skill for solo builders who want a complete, demonstrable data stack using a real public API. It walks through collecting artifact records from the Harvard Art Museums API, transforming them in Python, loading into a relational database, running SQL analytics, and exposing insights through a Streamlit app. Triggers align with portfolio projects: ETL setup, museum analytics dashboards, batch processing, and schema design for collections data. You clone the reference repository, install Python dependencies, and configure API and database credentials via environment variables. The skill is intermediate in operational detail—you need comfort with SQL, Python packaging, and basic deployment assumptions for your DB host. It is not a managed cloud kit; you supply infrastructure and keys. Outcome is a reproducible analytics product you can extend for other cultural or catalog APIs using the same architectural pattern.1.4kinstalls 46Harvard Artifacts Etl PipelineHarvard-artifacts-etl-pipeline is an agent skill from ara.so’s Data Skills collection for solo builders who want a repeatable museum-data stack instead of scattered scripts. It guides implementation of Harvard Art Museums API integration with responsible pagination and rate limits, transformation and loading of metadata, media, and color attributes into MySQL, and SQL-backed analytics on the loaded schema. On top of that relational layer, it covers Streamlit dashboards wired to Plotly so you can explore collection trends and artifact attributes interactively. Use it when triggers match building an ETL for Harvard API data, standing up artifact analytics, streaming-style extraction workflows, or querying a local Harvard collection database you maintain. It targets intermediate builders comfortable with Python data tooling and a small relational store. The skill emphasizes a clear pipeline architecture so agents produce maintainable engineering artifacts, not a one-time CSV dump.1.4kinstalls 47Snowflake Dbt Airbnb Analyticssnowflake-dbt-airbnb-analytics is a reference analytics-engineering skill from the ara.so Data Skills collection. It walks a solo builder through loading open Inside Airbnb datasets into Snowflake, modeling them with dbt across staging, intermediate, and mart layers, hardening quality with tests, and surfacing results in Streamlit. Triggers cover standing up profiles, incremental monthly aggregates from calendar data, and implementing modern medallion-style layering without hand-waving SQL structure. Use it when you are learning or shipping a credible warehouse pattern for marketplace/listing analytics, not when you only need a one-off spreadsheet. The skill emphasizes incremental merges on facts, relationship tests between dimensions and facts, and configuration of dbt profiles for Snowflake connections—patterns that transfer to other domains once you swap the seed data.1.4kinstalls 48Data Engineering Patterns Fabric Databricksdata-engineering-patterns-fabric-databricks is a reference skill from ara.so’s Data Skills collection that gives solo builders and small data teams a searchable body of patterns for Microsoft Fabric, Azure Databricks, and PySpark. Instead of piecing lakehouse design from scattered docs, you invoke it when you need concrete guidance on pipelines, Delta Lake behavior, cluster tuning, Unity Catalog governance, streaming ingestion, or Fabric warehouse and Power BI integration. The catalog spans on the order of six hundred patterns split across Fabric-focused areas (Data Factory pipelines, lakehouse PySpark, SQL warehouse, architecture) and Databricks-focused areas (compute, workflows, Delta, streaming, SQL/Photon). It supports Build when you are standing up analytics infrastructure, Ship when you are hardening production pipelines, and Operate when you are optimizing cost and reliability. The skill is procedural knowledge for agents: ask pattern-shaped questions and apply answers to your repo or platform config rather than expecting a single generated artifact every time.1.3kinstalls 49Game Analytics Platform Computer VisionGame Analytics Platform - Computer Vision is an agent skill for solo builders who want a local-first fitness game powered by computer vision instead of manual rep counting. It documents how to combine YOLO v8 for objects and people, MediaPipe for skeletal pose and form checks, and a Spring Boot backend that launches and manages Python workers, with a React and Vite dashboard to start sessions and review results. Workout data lands in CSV exports suitable for analytics or coaching loops, and pyttsx3 adds spoken feedback during exercises. The skill answers setup, adding new exercise modes, bridging Java orchestration to Python vision code, and configuring real-time pose games. It suits indie hackers prototyping motion games, health tech demos, or gym-tech MVPs without shipping video to the cloud first.1.3kinstalls 50Infrastructure Cicd Data EngineeringInfrastructure CI/CD for Data Engineering teaches a reference implementation for shipping data-platform changes through GitHub Actions and Terraform on AWS. Solo builders and small data teams often paste access keys into CI until something leaks; this skill walks OIDC trust between GitHub and AWS, an S3-backed remote state bootstrap, and workflows that validate formatting on every pull request while requiring explicit approval before apply. The project structure splits bootstrap concerns (state bucket, identity provider) from the main stack so you can evolve warehouses, buckets, and IAM without re-running one-off setup blindly. It fits when you are moving from laptop `terraform apply` to a team-reviewed pipeline for lakes, jobs, or networking around analytics workloads. Expect intermediate familiarity with Terraform modules, AWS IAM, and GitHub environments. The skill is procedural documentation plus patterns rather than a one-click deploy button—you adapt module boundaries to your org’s data estate.1.3kinstalls 51Harvard Artifacts Collection Analytics AppHarvard Artifacts Collection Analytics App is a data-skills workflow for solo builders who want a credible museum analytics project without inventing pipeline structure from scratch. It walks through pulling Harvard Art Museums API payloads—artifacts, media, colors—handling pagination and rate limits, flattening nested JSON into relational schemas, and loading MySQL or TiDB Cloud. On top of the warehouse layer you execute analytical SQL and expose results through a Streamlit front end with Plotly visualizations. The skill fits indie developers learning ETL, SQL analytics, and lightweight BI in one repo they can demo to clients or employers. Triggers align with explicit questions about Harvard API pipelines, TiDB setup, and Streamlit dashboards. Expect intermediate comfort with Python, SQL, and API keys; it is not a one-click hosted product but a build recipe you adapt to your cloud database and dashboard copy.1.1kinstalls 52Harvard Artifacts Etl StreamlitHarvard Artifacts ETL & Streamlit is a data-skills agent package for builders who want a credible museum-analytics demo or internal research tool without designing the pipeline from scratch. It walks through Harvard Art Museums API access, pagination, and nested JSON flattening into normalized SQL tables for artifacts, media, and colors, then loads data suitable for TiDB or similar SQL engines. On top of storage, the skill emphasizes twenty-plus ready-made analytical queries and Streamlit screens wired to Plotly so stakeholders can filter and visualize collection attributes interactively. Triggers match questions like building an ETL for Harvard data or pairing the API with Streamlit. Complexity sits at intermediate: you need Python comfort, basic SQL modeling, and local env setup for Streamlit. It is phase-specific to building the data layer but naturally extends into Grow when you ship dashboards to users or Validate when you prototype a data product idea around cultural heritage APIs.1kinstalls 53Harvard Art Museum Data EngineeringHarvard Art Museum Data Engineering is an agent skill for solo builders who want a repeatable museum-data stack instead of one-off API scripts. It walks through cloning the reference project, installing Streamlit and database drivers, configuring credentials, and running an ETL that normalizes Harvard Art Museums artifact payloads into relational tables before surfacing them in dashboards. The skill fits validate-to-build moments when you need proof that a public cultural dataset can power queries, charts, and narrative insights for a side project, client pitch, or internal research tool. Because it spans extraction, transformation, load, and viz in one flow, it reduces glue code and schema guesswork for indie data engineers who already know Python but lack a curated pattern for arts APIs.1kinstalls

Five minutes, every Monday - the tools, releases and tactics for developers.

unsubscribe anytime.

aradotso/data-skills · 53 skills · Skillselion