
aradotso/data-skills
52 skills755 installs52 starsGitHub
Install
npx skills add https://github.com/aradotso/data-skillsSkills in this repo
1Apache Airflow OrchestrationApache Airflow Orchestration is an agent skill from the Data Skills collection that teaches solo builders and small teams how to programmatically author, schedule, and monitor data workflows as DAGs in Python. It fits when you have outgrown ad-hoc scripts or fragile cron and need versionable pipelines you can test and collaborate on. The skill walks through constrained pip installs and a recommended Docker Compose layout for local development, then supports day-to-day work: choosing operators, configuring connections, using XCom between tasks, and diagnosing task failures. Primary placement is Build → Integrations because most installs start while wiring backends and ETL; the same knowledge carries into Operate when pipelines break in production. Intermediate complexity—comfort with Python and basic DevOps helps.356installs2Datatalks Data Engineering ZoomcampDatatalks Data Engineering Zoomcamp is an agent skill that turns the free nine-week DataTalks curriculum into actionable help for solo and indie builders learning production-style data pipelines. It is for anyone following Zoomcamp modules who needs concrete setup and homework guidance—Docker and Terraform for Module 1, orchestration with Kestra, warehousing in BigQuery, analytics engineering with dbt, batch work in Spark, and streaming with Kafka—without losing thread across weeks of material. Use it when triggers match course work: zoomcamp environment setup, homework completion, module overviews, or running Spark and Terraform labs. The skill matters because data engineering spans many tools at once; having procedural, module-aware assistance reduces friction on GCP, containers, and git-based project repos while you build real pipeline muscle. It complements self-paced study when you are not in an active cohort but still want the same ordered path the course defines.349installs3Altimate Data Engineering SkillsAltimate Data Engineering Skills is a Claude Code skill bundle from ara.so that teaches agents how experienced analytics engineers approach dbt and Snowflake—not just SQL syntax. It covers the full model lifecycle: scaffolding models, fixing compilation failures, adding tests and documentation, safe refactors, incremental patterns, and SQL-to-dbt migration. On Snowflake it adds cost visibility and optimization workflows keyed by query text or ID. A delegation skill hands off complex jobs to the altimate-code CLI when chat-based iteration is not enough. Solo and indie builders shipping warehouse-backed products can install it when they want consistent dbt project hygiene and faster debugging without hiring a full-time analytics engineer on day one. Triggers align with everyday phrases like “help me create a dbt model” or “find expensive queries in Snowflake,” so discovery stays practical during Build and later Operate tuning.1installs4Amee Joshi Data Engineering PortfolioAmee Joshi Data Engineering Portfolio is a reference agent skill from ara.so’s Data Skills collection that shows production-style Azure analytics builds for solo builders and small teams planning a lakehouse or warehouse. It is not a one-click deployer; it is curated narrative and structure you invoke when you want concrete examples of medallion layering, cloud ingestion, transformation, modeling, and BI handoff. Triggers cover portfolio walkthroughs, pipeline design, medallion and lakehouse questions, and end-to-end Databricks-style patterns. Use it during Validate when scoping a data product, during Build when aligning folder and pipeline layout, and during Operate when evolving incremental loads and gold-layer semantics. The skill optimizes learning velocity for builders who already target Microsoft’s data plane and want star-schema and metadata-driven ingestion spelled out in a single portfolio-shaped artifact your agent can cite while drafting your own repo.1installs5Analytics Tracking AutomationAnalytics Tracking Automation is an ara.so Data Skills package that turns “we need GA4” into an agent-executable pipeline. Solo builders and small teams point it at a URL or Shopify store; it crawls pages, clusters them by intent (product, pricing, contact, and similar), proposes an event schema, pushes changes into GTM, validates in preview, and publishes when checks pass. It also supports auditing an existing container when tracking drifted or never matched the product map. The skill fits after you have something to measure—post-launch growth work and pre-launch ship checks when you want tags correct before traffic hits. It expects comfort with Google Analytics 4, Tag Manager, and API credentials, not a spreadsheet-only analytics workflow.1installs6Data Engineering Medallion PipelineData Engineering Medallion Pipeline is a workflow skill from the ara.so Data Skills collection that teaches agents how to implement a full medallion lakehouse pattern on open-source tooling. Solo builders use it when they need a credible Bronze, Silver, and Gold path without immediately buying a managed warehouse: MinIO stands in for object storage, Airbyte handles ingestion into PostgreSQL, DBT owns transformations and tests, and Airflow schedules the DAGs with Grafana and Prometheus for basic observability. The skill matches long-tail intents like configuring Airbyte with MinIO and Postgres, creating dbt layer models, and deploying the stack locally with Docker Compose. It suits indie SaaS founders validating analytics plumbing, consultants demoing ELT to clients, and engineers learning modern data stack wiring. After the pipeline runs locally, you typically harden tests, promote environments, and connect BI—this skill focuses on getting that first integrated loop working with clear layer boundaries.1installs7Data Engineering Patterns Fabric Databricksdata-engineering-patterns-fabric-databricks is a reference skill from ara.so’s Data Skills collection that gives solo builders and small data teams a searchable body of patterns for Microsoft Fabric, Azure Databricks, and PySpark. Instead of piecing lakehouse design from scattered docs, you invoke it when you need concrete guidance on pipelines, Delta Lake behavior, cluster tuning, Unity Catalog governance, streaming ingestion, or Fabric warehouse and Power BI integration. The catalog spans on the order of six hundred patterns split across Fabric-focused areas (Data Factory pipelines, lakehouse PySpark, SQL warehouse, architecture) and Databricks-focused areas (compute, workflows, Delta, streaming, SQL/Photon). It supports Build when you are standing up analytics infrastructure, Ship when you are hardening production pipelines, and Operate when you are optimizing cost and reliability. The skill is procedural knowledge for agents: ask pattern-shaped questions and apply answers to your repo or platform config rather than expecting a single generated artifact every time.1installs8Data Engineering Study MaterialData Engineering Study Material is a journey-wide reference skill from ara.so’s Data Skills collection. It points your agent at a comprehensive GitHub study guide covering data engineering fundamentals, modern stack components, pipeline patterns, cloud data platforms, governance, and interview prep. Solo builders use it when they are exploring whether a product needs a warehouse, lakehouse, or streaming path, during Validate when scoping MVP data requirements, or anytime they need plain-language explanations of ETL versus ELT, orchestration, or observability without enrolling in a course. The skill is research-oriented: clone the repository and let the agent explain concepts, compare tools, or outline learning paths. It does not deploy infrastructure or run jobs; it reduces conceptual risk before you pick DuckDB, BigQuery, Kafka, or Airflow for real.1installs9Employee Performance Analytics Hremployee-performance-analytics-hr is a Data Science skill for solo builders and tiny teams who need HR analytics without a full BI suite. It packages SQL and Python workflows to aggregate employee KPIs, compare departments, visualize productivity trends, and generate performance reports from structured workforce data. Install follows a standard clone, virtualenv, and pip install -r requirements.txt path, then you run the analytics pipeline described in the skill: relational feature engineering in SQLite, pandas transformations, and matplotlib or seaborn charts for dashboards. Use it in the Grow phase when you are measuring workforce efficiency, supporting internal ops, or prototyping HR-facing SaaS analytics. It is intermediate complexity because you must bring clean employee datasets and comfort with SQL plus Python data tooling. The skill is phase-specific analytics work, not a generic coding assistant.1installs10Enterprise Data Engineering Pipeline Ssis PysparkEnterprise Data Engineering Pipeline (SSIS + PySpark) is an agent skill for solo builders and small teams who need a credible Microsoft-stack data warehouse without piecing docs together ad hoc. It walks through ingesting Sales, Products, and Customers CSVs, transforming with SQL Server Integration Services, loading a star schema of facts and dimensions, and finishing with Python quality checks plus PySpark for heavy aggregates. Use it when triggers mention enterprise ETL, SSIS packages, dimensional modeling, or scaling row counts beyond what pure T-SQL comfortably handles. It matters because analytics-ready warehousing is usually where indie SaaS and internal tools stall—this packages orchestration, modeling, and scale-up analytics into one procedural playbook your coding agent can follow step by step.1installs11Game Analytics Platform Computer VisionGame Analytics Platform - Computer Vision is an agent skill for solo builders who want a local-first fitness game powered by computer vision instead of manual rep counting. It documents how to combine YOLO v8 for objects and people, MediaPipe for skeletal pose and form checks, and a Spring Boot backend that launches and manages Python workers, with a React and Vite dashboard to start sessions and review results. Workout data lands in CSV exports suitable for analytics or coaching loops, and pyttsx3 adds spoken feedback during exercises. The skill answers setup, adding new exercise modes, bridging Java orchestration to Python vision code, and configuring real-time pose games. It suits indie hackers prototyping motion games, health tech demos, or gym-tech MVPs without shipping video to the cloud first.1installs12Harvard Artifacts Collection Analytics AppHarvard Artifacts Collection Analytics App is a data-skills workflow for solo builders who want a credible museum analytics project without inventing pipeline structure from scratch. It walks through pulling Harvard Art Museums API payloads—artifacts, media, colors—handling pagination and rate limits, flattening nested JSON into relational schemas, and loading MySQL or TiDB Cloud. On top of the warehouse layer you execute analytical SQL and expose results through a Streamlit front end with Plotly visualizations. The skill fits indie developers learning ETL, SQL analytics, and lightweight BI in one repo they can demo to clients or employers. Triggers align with explicit questions about Harvard API pipelines, TiDB setup, and Streamlit dashboards. Expect intermediate comfort with Python, SQL, and API keys; it is not a one-click hosted product but a build recipe you adapt to your cloud database and dashboard copy.1installs13Harvard Artifacts Collection Analytics PipelineHarvard Artifacts Collection Analytics Pipeline is a data-skills package from ara.so that documents a complete museum analytics stack for solo builders learning or demoing data engineering. When triggers fire—building a Harvard API pipeline, ETL for artifacts, SQL plus Streamlit dashboards—the skill walks through cloning the reference repository, installing Python dependencies, and following the architecture API → ETL → SQL → Analytics → Visualization. Extraction covers Harvard Art Museums API artifact metadata, media, and color data; transformation and load target MySQL or TiDB Cloud; analytics layer includes twenty-plus SQL queries; presentation uses Streamlit with Plotly. It fits indie builders shipping internal research tools, portfolio projects, or vertical SaaS around cultural datasets rather than ad-hoc notebook scripts. Prism shelves it under Build backend because the primary work is pipeline and datastore implementation, not launch SEO or production SRE—though you might later move the dashboard to Grow analytics once deployed.1installs14Harvard Artifacts Collection Data EngineeringHarvard Artifacts Collection Data Engineering is an ara.so data skill that gives solo builders a complete reference stack for museum collection data. It connects to the Harvard Art Museums API, respects pagination and rate limits, and runs batch ETL that maps nested JSON into three relational tables for metadata, media, and colors. On top of that warehouse it ships more than twenty canned SQL analyses and Streamlit dashboards so you can explore periods, media types, and color distributions without writing glue code from scratch. Use it when you want a credible portfolio piece or internal prototype that proves you can integrate a public REST API, design a small star-style schema, and ship analyst-facing UI—ideal for learning data engineering patterns before adapting them to your own domain.1installs15Harvard Artifacts Collection Data Engineering AnalyticsHarvard Artifacts Collection Data Engineering Analytics is a build-phase agent skill that walks solo builders through a complete museum-data product: ingest from the Harvard Art Museums API, transform nested JSON into normalized tables, load into SQL (MySQL or TiDB Cloud), and explore results in Streamlit with Plotly. It is aimed at developers learning production-shaped ETL—handling pagination, rate limits, and batch writes—while still shipping something visible to stakeholders. Documented triggers cover phrases like building an ETL pipeline for Harvard data, setting up a Streamlit analytics dashboard, and developing a small artifacts data warehouse. The skill points to an external GitHub project for clone-and-run setup rather than embedding every script inline. Use it when you want a credible portfolio piece or internal analytics prototype, not when you only need a one-off CSV export. Agents should treat it as a full-stack data template: SQL layer plus app layer, with analytics queries already scoped.1installs16Harvard Artifacts Collection Etl AnalyticsHarvard Artifacts Collection ETL Analytics is an agent skill for solo builders who want a real museum-data project instead of toy CSVs. It walks through Harvard Art Museums API access, disciplined extract-transform-load into a normalized SQL schema, and opinionated analytics queries you can run immediately. Streamlit is the presentation layer so you get an interactive dashboard without building a full frontend first. Triggers match concrete intents: museum pipelines, artifact visualization, and SQL-plus-Python data engineering. It suits indie hackers learning pipelines, portfolio data projects, or internal tools that need API-backed collections. You still bring your own hosting, secrets, and compliance choices; the skill focuses on structure and implementation patterns rather than one-click deploy.1installs17Harvard Artifacts Data Engineering AnalyticsHarvard Artifacts Data Engineering & Analytics is an agent skill from the ara.so Data Skills collection that walks solo builders through a complete museum-data pipeline: pull paginated records from the Harvard Art Museums API, normalize nested JSON into relational schemas, load batches into MySQL or TiDB Cloud, run analytical SQL, and surface results in Streamlit with Plotly. It is aimed at developers who want a credible portfolio piece or internal prototype without inventing architecture from scratch—rate limits, transforms, and visualization hooks are spelled out as a single narrative. Use it when you are in the build phase and need to prove you can ship ETL, warehousing, and lightweight BI together on real cultural-heritage metadata. It is less about one-off charts and more about repeatable ingestion and query patterns you can adapt to other REST sources after the museum example lands.1installs18Harvard Artifacts Data Engineering AppHarvard Artifacts Data Engineering App is a data-skills template for agents building a real-world analytics stack on top of the Harvard Art Museums public API. Solo builders learning data engineering—or shipping a niche cultural analytics side project—get a coherent story: pull paginated artifact records respectfully, normalize nested JSON into relational tables, run meaningful SQL summaries, and expose insights through Streamlit instead of a one-off CSV dump. The skill encodes schema decisions across metadata, media, and color facets so agents do not reinvent brittle one-table designs. Pre-authored analytical queries accelerate exploration of collections, mediums, and visual attributes without forcing the user to write SQL from scratch on day one. Streamlit becomes the thin product layer for demos, internal tools, or portfolio pieces. Triggers in the skill readme map directly to phrases like building an ETL pipeline, museum dashboard, or Harvard API integration, which keeps discovery aligned with how indie builders actually prompt their coding agents. It is intentionally educational and reproducible rather than a hosted SaaS-in-a-box, so expect to own database hosting, secrets, an1installs19Harvard Artifacts Data Engineering PipelineHarvard Artifacts Data Engineering Pipeline is an agent skill from ara.so’s Data Skills collection that guides solo builders through a complete museum-data project. You integrate the Harvard Art Museums API, build an ETL that flattens nested JSON into normalized tables, load MySQL or TiDB Cloud with a thoughtful schema, and expose insights via SQL plus a Streamlit front end with Plotly. It fits indie developers who want a credible data-engineering portfolio piece, internal tooling for collection analytics, or a template for similar cultural-heritage APIs. Use it when triggers mention Harvard API pipelines, artifact ETL, SQL schema design for collections, or Streamlit dashboards for museum data—not when you only need a one-off chart without persistence. The flow emphasizes practical patterns: respectful API usage, relational modeling, repeatable queries, and lightweight visualization so the result is demonstrable end to end.1installs20Harvard Artifacts Data PipelineHarvard Artifacts Data Pipeline is an agent skill from the Data Skills collection for builders who want a concrete museum-data ETL reference instead of abstract pipeline theory. It documents how to pull Harvard Art Museums API payloads, flatten nested JSON into relational tables, load MySQL or TiDB, and expose analytics through Streamlit with Plotly visuals. Triggers in the skill metadata cover building ETL workflows, SQL querying, and dashboard setup—ideal when you are prototyping a data product or portfolio piece in the Build phase. Solo developers benefit because the architecture string and dependency list give agents a full stack anchor: requests for API access, pandas for transforms, and streamlit for demo UIs. It does not replace production orchestration choices like Airflow or dbt unless you extend the pattern. Configure API keys via .env and treat rate limits and licensing of museum data as your compliance responsibility.1installs21Harvard Artifacts Etl Analyticsharvard-artifacts-etl-analytics teaches coding agents to build a complete museum-data product: paginated harvest from the Harvard Art Museums API, transformation of nested JSON into normalized tables, batch inserts into MySQL or TiDB Cloud, analytical queries across artifacts and media attributes, and an interactive Streamlit front end with Plotly charts. Solo builders use it as a concrete template when they need to demonstrate data engineering credibility—scoped API access, sensible schema design, and a visible dashboard—not a one-off CSV script. Triggers match common search phrasing for ETL, Streamlit analytics, and Harvard collection queries. You still need your own API key and database credentials; the skill encodes patterns from the ara.so Data Skills collection rather than hosting data itself.1installs22Harvard Artifacts Etl PipelineHarvard-artifacts-etl-pipeline is an agent skill from ara.so’s Data Skills collection for solo builders who want a repeatable museum-data stack instead of scattered scripts. It guides implementation of Harvard Art Museums API integration with responsible pagination and rate limits, transformation and loading of metadata, media, and color attributes into MySQL, and SQL-backed analytics on the loaded schema. On top of that relational layer, it covers Streamlit dashboards wired to Plotly so you can explore collection trends and artifact attributes interactively. Use it when triggers match building an ETL for Harvard API data, standing up artifact analytics, streaming-style extraction workflows, or querying a local Harvard collection database you maintain. It targets intermediate builders comfortable with Python data tooling and a small relational store. The skill emphasizes a clear pipeline architecture so agents produce maintainable engineering artifacts, not a one-time CSV dump.1installs23Harvard Artifacts Etl StreamlitHarvard Artifacts ETL & Streamlit is a data-skills agent package for builders who want a credible museum-analytics demo or internal research tool without designing the pipeline from scratch. It walks through Harvard Art Museums API access, pagination, and nested JSON flattening into normalized SQL tables for artifacts, media, and colors, then loads data suitable for TiDB or similar SQL engines. On top of storage, the skill emphasizes twenty-plus ready-made analytical queries and Streamlit screens wired to Plotly so stakeholders can filter and visualize collection attributes interactively. Triggers match questions like building an ETL for Harvard data or pairing the API with Streamlit. Complexity sits at intermediate: you need Python comfort, basic SQL modeling, and local env setup for Streamlit. It is phase-specific to building the data layer but naturally extends into Grow when you ship dashboards to users or Validate when you prototype a data product idea around cultural heritage APIs.1installs24Harvard Artifacts Etl Streamlit AnalyticsHarvard Artifacts ETL & Streamlit Analytics is an agent skill from ara.so’s Data Skills collection that walks solo builders through a complete cultural-heritage data pipeline. You extract Harvard Art Museums API records—artifacts, media, and color information—transform them for analysis, load them into SQL, run analytical queries, and surface results in a Streamlit dashboard. It is aimed at indie developers learning production-shaped ETL without a separate data team: one repo, clear architecture (API → ETL → database → analytics → visualization), and Python dependencies via requirements.txt. Use it when triggers match building museum analytics, designing artifact-metadata workflows, or pairing SQL with Python visualization. It matters because it turns a public museum API into a portfolio-grade data-engineering story you can extend for other open cultural datasets.1installs25Harvard Art Museum Data EngineeringHarvard Art Museum Data Engineering is an agent skill for solo builders who want a repeatable museum-data stack instead of one-off API scripts. It walks through cloning the reference project, installing Streamlit and database drivers, configuring credentials, and running an ETL that normalizes Harvard Art Museums artifact payloads into relational tables before surfacing them in dashboards. The skill fits validate-to-build moments when you need proof that a public cultural dataset can power queries, charts, and narrative insights for a side project, client pitch, or internal research tool. Because it spans extraction, transformation, load, and viz in one flow, it reduces glue code and schema guesswork for indie data engineers who already know Python but lack a curated pattern for arts APIs.1installs26Harvard Art Museum Data Pipelineharvard-art-museum-data-pipeline teaches coding agents to deliver a complete data-engineering slice: pull collection records from the Harvard Art Museums API, respect pagination and rate limits, transform nested JSON into a normalized relational model, and expose insights through SQL plus an interactive Streamlit front end with Plotly charts. Solo builders learning pipeline patterns or shipping a niche cultural analytics demo get a concrete stack—Python, SQL, Streamlit—without inventing schema and query packs from scratch. Prism places it on Build/backend because the heavy lifting is ingestion, storage, and analytics APIs, while the dashboard is the consumption layer for validators and stakeholders. Triggers in SKILL.md explicitly cover ETL setup, dashboard builds, and SQL analysis workflows, making it easy to invoke when a user asks how to engineer museum or similar open cultural datasets.1installs27Harvard Art Museum Etl AnalyticsHarvard Art Museum ETL Analytics is an agent skill for solo builders who want a complete museum-data engineering story instead of disconnected scripts. It walks through pulling artifact records from the Harvard Art Museums API, reshaping nested JSON into tables suited for MySQL or TiDB Cloud, running analytical SQL on the loaded collection, and exposing results in Streamlit with Plotly. The documented flow matches how indie developers prove data chops: reproducible ETL, honest schema choices, and a demo app reviewers can click through. Triggers align with questions about Harvard API pipelines, artifact schemas, and museum analytics dashboards. You install dependencies from requirements.txt after cloning the reference project, then iterate on extract, transform, load, and viz in that order. It is aimed at builders comfortable with Python and SQL who need a structured capstone rather than ad-hoc notebook snippets.1installs28Harvard Art Museums Data Engineering AppHarvard Art Museums Data Engineering App is a project-oriented agent skill from the ara.so Data Skills collection. It guides a solo builder through standing up a real ETL pipeline that pulls Harvard Art Museums API artifact data, transforms it into SQL tables, loads MySQL or TiDB Cloud, and surfaces interactive analytics in Streamlit with Plotly. Use it when you want a demonstrable data-app portfolio piece or a starting point for cultural-collection analytics rather than a generic CRUD tutorial. The skill emphasizes architecture clarity—API, pipeline, database, and visualization layers—so an agent can scaffold queries, transformations, and dashboard views in one pass. Intermediate builders comfortable with Python, SQL connectors, and basic cloud database setup get the most value; you still need your own API keys and environment secrets for Harvard and database hosting.1installs29Harvard Art Museums Data Engineering PipelineHarvard Art Museums Data Engineering Pipeline is a build-phase skill for solo builders who want a complete, demonstrable data stack using a real public API. It walks through collecting artifact records from the Harvard Art Museums API, transforming them in Python, loading into a relational database, running SQL analytics, and exposing insights through a Streamlit app. Triggers align with portfolio projects: ETL setup, museum analytics dashboards, batch processing, and schema design for collections data. You clone the reference repository, install Python dependencies, and configure API and database credentials via environment variables. The skill is intermediate in operational detail—you need comfort with SQL, Python packaging, and basic deployment assumptions for your DB host. It is not a managed cloud kit; you supply infrastructure and keys. Outcome is a reproducible analytics product you can extend for other cultural or catalog APIs using the same architectural pattern.1installs30Harvard Art Museums Data PipelineHarvard Art Museums Data Pipeline is an agent skill for indie builders and data-curious founders who want a credible full-stack data project without inventing architecture from scratch. It walks through building the Harvard Artifacts Collection-style app: pulling paginated artifact records from the Harvard Art Museums API, flattening nested JSON into relational tables, loading into MySQL or TiDB Cloud, running analytical SQL over the collection, and exposing charts through Streamlit with Plotly. Triggers in SKILL.md mirror how people actually search—ETL for museum data, Streamlit dashboards, SQL analytics on artifacts—so the skill fits when you are proving data chops, building a demo for investors, or shipping a niche cultural analytics SaaS. You will need API keys, a database, and comfort with Python data tooling. It is phase-specific to Build because the outcome is a working pipeline and dashboard artifact, not launch distribution or growth loops. After the pipeline runs, typical next steps are Ship testing, hardening secrets for the API key, and Launch SEO if the dashboard is public.1installs31Harvard Art Museums Etl AnalyticsHarvard Art Museums ETL & Analytics is an agent skill from ara.so’s Data Skills collection that walks solo builders through pulling open museum collection data, normalizing it into a small relational model, and shipping analyst-friendly views. It targets developers who want a credible demo or internal tool without designing an integration from scratch: agents can scaffold API calls, table DDL, transformation logic, and Streamlit pages that answer real questions about artists, periods, and media. Use it when you are validating a data product idea, building a portfolio pipeline, or adding a curated public dataset to an existing SaaS. The flow emphasizes practical engineering—bounded API usage, clean star-ish tables, reusable SQL, and dashboard charts—so you spend time on insights rather than boilerplate. It is less about generic ML training and more about classical analytics engineering on a well-documented cultural heritage API.1installs32Harvard Art Museums Etl PipelineHarvard Art Museums ETL Pipeline is an agent skill from ara.so’s Data Skills collection for solo builders who want a concrete museum-data engineering project. It guides you through extracting paginated artifact records from the Harvard Art Museums API, reshaping nested JSON into relational tables, loading batches into MySQL or TiDB Cloud, and exposing insights through Streamlit with Plotly. The skill is aimed at indie developers learning ETL design, SQL schema choices for cultural-metadata fields, and lightweight analytics apps without standing up a full data platform. Use it when you need a repeatable pattern for API-to-warehouse flows and an explorable dashboard for collection statistics, artist distributions, or object attributes—not when you only need a one-off CSV export.1installs33Iac Data Engineering TerraformIaC Data Engineering Terraform is an agent skill from ara.so’s Data Skills collection that encodes Infrastructure-as-Code patterns for solo data builders on AWS. It walks through provisioning S3 for lake or staging storage, EC2 for processing workloads, and IAM policies that keep pipeline access explicit—using Terraform as the single declarative interface. Prerequisites assume Terraform and AWS CLI on the machine and configured credentials, matching how indie engineers bootstrap a first pipeline environment without clicking through the console. The skill fits builders who treat infrastructure as versioned code alongside ETL jobs, and it remains relevant when you extend stacks in Operate or redeploy after Validate proves a prototype. It is pattern-oriented rather than a one-click deploy of a named product, so agents adapt modules to your naming and regions while preserving state discipline.1installs34Iac Terraform Data EngineeringIaC for Data Engineering with Terraform is an agent skill from the ara.so Data Skills collection that teaches solo and indie builders how to stand up AWS infrastructure for analytics and pipelines using Infrastructure as Code. It centers on Terraform configurations for S3 storage, EC2 compute, and IAM permissions so environments stay reproducible, reviewable in git, and aligned with data-engineering workflows rather than one-off console clicks. The skill walks through what the project delivers—templates, lifecycle operations, and state discipline—and assumes you already have AWS access and the Terraform CLI installed. Use it when triggers match tasks like setting up Terraform for data engineering, provisioning S3 and EC2 with IaC, managing state for data platforms, or safely destroying lab stacks. For a one-person team shipping agents or ETL jobs, this reduces drift between local experiments and shared buckets or roles. Pair it with your pipeline code and CI plans so infra changes ride the same review process as application changes.1installs35Infrastructure Cicd Data EngineeringInfrastructure CI/CD for Data Engineering teaches a reference implementation for shipping data-platform changes through GitHub Actions and Terraform on AWS. Solo builders and small data teams often paste access keys into CI until something leaks; this skill walks OIDC trust between GitHub and AWS, an S3-backed remote state bootstrap, and workflows that validate formatting on every pull request while requiring explicit approval before apply. The project structure splits bootstrap concerns (state bucket, identity provider) from the main stack so you can evolve warehouses, buckets, and IAM without re-running one-off setup blindly. It fits when you are moving from laptop `terraform apply` to a team-reviewed pipeline for lakes, jobs, or networking around analytics workloads. Expect intermediate familiarity with Terraform modules, AWS IAM, and GitHub environments. The skill is procedural documentation plus patterns rather than a one-click deploy button—you adapt module boundaries to your org’s data estate.1installs36Llm Intelligent Public Opinion AnalyticsLLM intelligent public opinion analytics is a deployment and operations skill for a full opinion platform: it ingests hot lists from many Chinese mainstream platforms, applies LLM analysis (including video), and exposes conversational queries plus clustering, sentiment, and alerting. It targets builders who need brand, market, or narrative visibility across Weibo, Bilibili, Zhihu, Baidu, and similar sources—not a single-platform scraper. Documented triggers include setting up monitoring, analyzing social trending topics, configuring push notifications, and building dashboards. Installation assumes browser drivers and related prerequisites in the upstream project. Solo operators use it when they must aggregate cross-platform buzz, automate sentiment passes, and route high-signal topics to WeChat, email, or Telegram without building crawlers from scratch.1installs37Llm Public Opinion AnalyticsLLM-Based Public Opinion Analytics Assistant is an agent skill from ara.so’s Data Skills collection for solo builders who need trend and sentiment intelligence without hiring a data team. It combines web scraping from major Chinese social and news surfaces with large-language-model analysis so you can query hot rankings in conversation, run topic-specific searches, cluster related narratives, and score sentiment—including signals pulled from video when the pipeline supports it. The design targets operators who want a repeatable monitoring system rather than manual copy-paste from hot lists. Installation assumes Python 3.8 or newer and a MySQL database to store crawled items and analysis results. Push notifications let you react when a topic spikes across Weibo, Bilibili, Douyin, Baidu, and similar sources. Use it when you are validating narrative risk for a launch, tracking brand mentions during growth, or operating a lightweight news desk from your agent workflow.1installs38Llm Public Opinion Analytics AssistantLLM Public Opinion Analytics Assistant is an ara.so Data Skills package that wires together broad hot-list crawling and large-language-model analysis so indie builders can watch what people are saying without manually hopping across platforms. It ingests trending data from many lists, exposes search and clustering through a web interface, and layers sentiment and trend signals on top for quicker interpretation. Keyboard shortcuts help you run the crawler day to day, while outbound pushes keep you out of the dashboard when something spikes. The skill fits solo operators who ship products or content and need a pragmatic opinion radar—not a full enterprise social-listening suite—backed by LLM summarization rather than static keyword alerts. Configure platforms, notification channels, and analysis flows when you are ready to operationalize monitoring after validation or launch.1installs39Mm2 Analytics Dashboard Robloxmm2-analytics-dashboard-roblox packages a Murder Mystery 2–focused analytics stack for Roblox players and small creators who want inventory discipline and clearer gamepass decisions without building charts from scratch. The skill describes automated inventory capture, dashboard visualization, win/loss analytics, and strategy recommendations aimed at knife skins, passes, and trade timing. Installation paths include a chmod setup.sh --install flow and separate Node and Python dependency installs for teams that prefer manual control. Trigger phrases in the SKILL frontmatter mirror how builders search when sync breaks or when they want exported performance reports. It is niche by design—valuable when MM2 is your growth surface and you treat Roblox inventory like a portfolio you need to measure, not a casual inventory screen.1installs40Mm2 Analytics Roblox Toolkitmm2-analytics-roblox-toolkit is a data-skills package from ara.so that teaches agents how to help solo creators and serious MM2 players use the Murder Mystery 2 Analytics Dashboard—inventory tracking, performance metrics, visualization, and export tooling for Roblox’s Murder Mystery 2 mode. Triggers cover dashboard setup, knife collection analysis, gamepass statistics, gameplay exports, and data-driven strategy optimization, making it a niche but deep companion when your product or content strategy revolves around MM2 trading, grinding, or community analytics rather than generic SaaS growth. Install and configure the dashboard, wire stats sources, then ask the agent to interpret win rates by role, completeness of rare skins, or patterns that inform in-game decisions. It spans Grow phase measurement and Validate-style experiments when you are testing whether a trading or content angle is worth pursuing. Expect game-domain specificity: it will not generalize to other Roblox titles without adaptation. Deliverables are configured trackers, charts, and exported datasets you can feed into spreadsheets or custom models.1installs41Mm2 Analytics Roblox Trackermm2-analytics-roblox-tracker is an ara.so Data Skills package for solo Roblox players and indie creators who want data-backed Murder Mystery 2 decisions. It combines an analytics dashboard with inventory management—knife skins, gamepasses, collection completeness—and strategy analysis through win/loss and performance visualizations. AI-assisted insights support pattern recognition and predictive views on inventory value. The project targets multi-platform use (desktop, tablet, mobile, web) and exports statistics as JSON or CSV for spreadsheets or custom tools. Install via chmod +x setup.sh and ./setup.sh --install or manual clone from the published repository. Use it when you already play MM2 and want to compound results in the Grow phase, or when integrating analytics hooks into your own codebase. It is not a general Roblox studio builder; it is a specialized tracker and dashboard for one game’s metagame economy and performance.1installs42Mm2 Roblox Analytics ToolkitMM2 Roblox Analytics Toolkit is a niche agent skill from the ara.so Data Skills collection that wires Murder Mystery 2 players into an analytics stack for knife skins, gamepasses, win/loss ratios, and AI-assisted strategy tips. Installation centers on cloning the mm2-analytics-dashboard-2026 repository, running setup.sh or manual Node and Python installs, and pointing API keys at a local data directory. Triggers such as analyzing inventory, configuring a stats tracker, and exporting collection data tell agents when to load procedural steps instead of generic Roblox advice. Solo builders who treat MM2 trading or competitive play as a side business can use it to standardize exports and visualization patterns. It does not replace official Roblox analytics for studio games; it optimizes personal MM2 performance and collection intelligence once you already play the mode regularly.1installs43Mm2 Roblox Analytics Trackermm2-roblox-analytics-tracker is an agent skill from ara.so's Data Skills line aimed at Roblox Murder Mystery 2 players and creators who want inventory discipline without spreadsheets. It bundles inventory management for knives and collectibles, performance analytics across roles, interactive dashboards, export paths, and optional AI-flavored insights such as value modeling and trade suggestions. Solo builders maintaining MM2 side projects or community tools can use it to configure trackers, run analytics passes, and troubleshoot dashboard setup when triggers mention inventory optimization or stat exports. It is domain-specific to MM2—not a generic Roblox analytics framework—so expect Roblox API and game-economy assumptions in the workflow. Use when you are growing engagement through better collection strategy rather than when you are only prototyping a new Roblox experience from scratch.1installs44Options Analytics Agent LanggraphOptions Analytics Agent with LangGraph is an agent skill from the ara.so Data Skills collection for solo builders who want a serious financial-options assistant instead of one-off scripts. It walks you through wiring LangGraph for multi-agent steps, pulling live market data from Polygon.io, caching and retrieving context with ChromaDB RAG, and keeping chat state in SQLite so analysis survives restarts. The README frames professional outputs—Greeks, sentiment-style signals, and exportable artifacts—plus a FastAPI service boundary so the agent can run like a small backend you own. Use it when you are in the build phase and need a repeatable pattern for options research bots, internal trading copilots, or fintech demos where latency, caching, and API keys matter. It assumes comfort with Python services, external market APIs, and vector stores; it is not a turnkey brokerage integration or compliance sign-off.1installs45Realtime Cinema Data Engineering PipelineRealtime Cinema Data Engineering Pipeline teaches solo builders how to assemble a production-style streaming analytics stack: Kafka ingests events, PostgreSQL stores Medallion layers, Airflow schedules transforms, and Streamlit surfaces metrics. It is aimed at learners who want a concrete cinema-themed domain rather than abstract ETL slides. Triggers cover bronze-silver-gold layering, streaming ELT, and configuring Kafka clients end to end. Installation assumes Docker Compose and a cloned reference repository with venv-managed Python dependencies. The skill fits indie SaaS or internal dashboards where near-real-time counts matter—ticket sales, session events, or similar high-volume feeds—without outsourcing pipeline design to a managed-only black box. Use it when you already committed to self-hosted or compose-based infra and need agent-guided wiring across ingestion, warehouse, orchestration, and viz in one narrative arc.1installs46Retail Etl Medallion PipelineRetail ETL Medallion Pipeline is an agent skill that walks solo builders and small data teams through a production-style Medallion Architecture for retail and hypermarket analytics. It ingests raw sales, inventory, and catalog data into Bronze, applies cleaning and domain rules in Silver—including shrinkage, recipe conversions, and rebate tiers—and publishes consolidated Gold models suitable for reporting. The skill is aimed at builders who need a credible warehouse pattern instead of one-off notebooks, especially when branches, suppliers, and product hierarchies complicate joins. Use it when triggers mention medallion layers, retail ETL, Airflow plus Spark, or designing analytics for inventory and sales. It matters because it encodes real retail edge cases that generic ETL templates skip, so agents produce layered SQL and pipeline structure you can extend rather than reinvent.1installs47Retail Etl Pipeline MedallionRetail ETL Pipeline Medallion is a data-engineering agent skill that walks solo builders through a full retail analytics pipeline using Medallion Architecture. It targets operators who receive messy branch-level sales and inventory exports and need governed layers instead of one-off scripts. The skill covers Bronze ingestion of raw feeds, Silver resolution of shrinkage and product-recipe logic, and Gold metrics for rebates, stock, and consolidated reporting. PySpark and SQL Server are the assumed execution surfaces, with emphasis on stored procedures and layer boundaries. Invoke when you are building—not merely researching—a warehouse for multi-location retail, and you want a repeatable pattern rather than bespoke notebooks per dataset.1installs48Roblox Mm2 Analytics ToolkitRoblox MM2 Analytics Toolkit packages data-skills workflows for Murder Mystery 2 players who want inventory intelligence beyond in-game UI. Triggers cover analyzing knife collections, configuring stats trackers, exporting gameplay metrics, and standing up a dashboard for win/loss ratios by role. Installation paths include a cloned repo with setup.sh --install or separate npm and pip dependency installs, positioning it as a local toolkit rather than a hosted SaaS. For Prism’s solo-builder audience, it is niche: valuable when you ship or seriously play MM2 content and need trading signals, collection completeness, or gamepass experimentation backed by numbers. It is not a general Roblox studio DevEx skill—it targets MM2-specific analytics. Builders outside that game should skip it; MM2 creators and competitive players gain structured reporting and inventory views that support Grow-phase iteration on engagement and monetization choices.1installs49Snowflake Dbt Airbnb Analyticssnowflake-dbt-airbnb-analytics is a reference analytics-engineering skill from the ara.so Data Skills collection. It walks a solo builder through loading open Inside Airbnb datasets into Snowflake, modeling them with dbt across staging, intermediate, and mart layers, hardening quality with tests, and surfacing results in Streamlit. Triggers cover standing up profiles, incremental monthly aggregates from calendar data, and implementing modern medallion-style layering without hand-waving SQL structure. Use it when you are learning or shipping a credible warehouse pattern for marketplace/listing analytics, not when you only need a one-off spreadsheet. The skill emphasizes incremental merges on facts, relationship tests between dimensions and facts, and configuration of dbt profiles for Snowflake connections—patterns that transfer to other domains once you swap the seed data.1installs50Terraform Data Engineering IacTerraform Data Engineering IaC is an agent skill from ara.so’s Data Skills collection that teaches Infrastructure-as-Code patterns for solo and indie data builders who need AWS without manual console drift. It walks through provisioning S3 for lake storage, EC2 for processing, and IAM for secure access, with Terraform state as the source of truth for changes. Use it when triggers fire around setting up data engineering infrastructure, automating data platform provisioning, or managing pipeline-related cloud resources as code. You need an AWS account, Terraform CLI, and AWS CLI configured with permissions for S3, EC2, and IAM. The skill fits builders shipping pipelines or analytics backends who want the same environment every deploy, not one-off buckets and instances. It is a task integration for cloud provisioning, not a full MLOps or orchestration playbook—pair it with your orchestrator and monitoring choices separately.1installs51Terraform Data Engineering InfrastructureTerraform Data Engineering Infrastructure is an agent skill from the ara.so Data Skills collection that teaches Infrastructure-as-Code patterns for analytics and pipeline teams on AWS. Solo builders and small data squads use it when they need S3 buckets for lakes or staging, EC2 for batch or ETL compute, and IAM wired correctly before pipelines go live. The skill emphasizes declarative, reviewable definitions you can promote across environments rather than one-off console setup. It fits indie founders standing up a first warehouse footprint as well as operators who must keep prod and non-prod aligned. Pair it with your existing Terraform toolchain and AWS credentials; outputs are module-style guidance and resource patterns, not a hosted control plane. Use when triggers mention Terraform for data platform setup, provisioning AWS for data pipelines, or infrastructure as code for analytics workloads.1installs52Terraform Iac Data Engineeringterraform-iac-data-engineering teaches agents how to manage AWS infrastructure for data engineering with Terraform. Solo builders standing up lakes, batch jobs, or pipeline hosts get opinionated patterns for S3 storage, EC2 compute, IAM access, and state handling instead of copying random HCL snippets. The skill aligns with common triggers such as setting up Terraform for data engineering, provisioning S3 and EC2, and managing resources for pipelines. It assumes you install Terraform and AWS CLI locally and wire credentials responsibly. Use it when you are codifying data platform foundations you will evolve through Ship and Operate, not for one-line console clicks.1installs