
Parallel Data Enrichment
Bulk-enrich company, people, or product lists from CSV or inline input with web-sourced fields via parallel-cli.
Overview
Parallel Data Enrichment is an agent skill most often used in Idea (also Validate, Grow) that bulk-adds web-sourced columns to entity lists using parallel-cli.
Install
npx skills add https://github.com/parallel-web/parallel-agent-skills --skill parallel-data-enrichmentWhat is this skill?
- Bulk enrichment via parallel-cli enrich run with CSV or inline entities
- Optional parallel-cli enrich suggest to propose enriched_columns before a run
- Multi-turn context via --previous-interaction-id from prior research tasks
- User-invocable with argument-hint for file or entities plus fields to add
- Warns users that runtime scales with row count and field count
- Supports multi-turn enrichment via --previous-interaction-id from a prior research task
Adoption & trust: 8k installs on skills.sh; 56 GitHub stars; 2/3 security scanners passed (skills.sh audits).
What problem does it solve?
You have a bare list of companies or people and need credible extra fields without manually researching every row.
Who is it for?
Builders with parallel-cli installed who want agent-driven CSV enrichment for market maps, lead lists, or competitive sets.
Skip if: Real-time in-app user enrichment at request latency, or environments that block shell and outbound network access.
When should I use this skill?
User needs bulk web-sourced fields added to companies, people, or products in a CSV or inline list via parallel-cli.
What do I get? / Deliverables
You receive an enriched dataset with agent-specified columns, optionally after a suggest pass that defines enriched_columns and processor for the run.
- Enriched CSV or dataset with requested columns
- Optional suggest JSON mapping for enriched_columns and processor
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
List enrichment usually starts when you are researching markets, competitors, or leads before committing to build or outreach. Research is the canonical shelf for adding CEO names, funding, and contact fields to raw entity lists.
Where it fits
Enrich a competitor CSV with funding and leadership fields before positioning notes.
Add firmographic columns to a prospect list to sanity-check segment size for an MVP niche.
Refresh outreach lists with updated contact and funding fields before a campaign send.
How it compares
CLI-backed bulk enrichment skill—not a hosted CRM sync or a one-off browser research prompt.
Common Questions / FAQ
Who is parallel-data-enrichment for?
Solo founders and indie operators who maintain lead or market lists and want agents to run Parallel enrichment jobs from the terminal.
When should I use parallel-data-enrichment?
During Idea competitor and audience research, Validate scope when sizing segments from lists, and Grow lifecycle prep when updating outreach or CRM fields.
Is parallel-data-enrichment safe to install?
The skill invokes parallel-cli over the network on your data; review the Security Audits panel on this Prism page and treat enriched PII according to your compliance needs.
SKILL.md
READMESKILL.md - Parallel Data Enrichment
# Data Enrichment Enrich: $ARGUMENTS ## Before starting Inform the user that enrichment may take several minutes depending on the number of rows and fields requested. ## Optional: Suggest output columns If the user gave a vague intent ("enrich these companies with useful info") and you're not sure what columns to add, ask the API for a suggestion before kicking off the run: ```bash parallel-cli enrich suggest "Find CEO and recent funding info" --json ``` The response is an envelope: `{title, processor, enriched_columns, warnings}`. Extract just the **`enriched_columns` array** (not the whole envelope) and pass it as the value of `--enriched-columns` on `enrich run`, **in place of `--intent`** — the two flags are alternative ways to specify what to enrich, not combined. If `suggest` returned a `processor`, pass it through explicitly via `--processor` on the `run` call (it's a tuned recommendation for the schema). Skip this whole section if the user already specified the fields they want. > `enrich suggest` requires `parallel-cli` ≥ 0.3.0. If it errors with anything resembling `no such command` / `No such command` / `unknown command`, **do not bail** — skip the suggestion step, fall through to step 1 with `--intent`, complete the run, and mention `parallel-cli update` (or `pipx upgrade parallel-web-tools`) in the final response so the user picks up the feature next time. ## Step 1: Start the enrichment Use ONE of these command patterns (substitute user's actual data): For inline data: ```bash parallel-cli enrich run --data '[{"company": "Google"}, {"company": "Microsoft"}]' --intent "CEO name and founding year" --target "output.csv" --no-wait --json ``` For CSV file: ```bash parallel-cli enrich run --source-type csv --source "input.csv" --target "output.csv" --source-columns '[{"name": "company", "description": "Company name"}]' --intent "CEO name and founding year" --no-wait --json ``` If this is a **follow-up** to a previous research task and you have its `interaction_id`, add context chaining: ```bash parallel-cli enrich run --data '...' --intent "..." --target "output.csv" --no-wait --json --previous-interaction-id "$INTERACTION_ID" ``` The enrichment will run with the full context of that prior research — so you can enrich entities discovered earlier without restating what was already found. Note: enrichment does **not** itself produce a new `interaction_id`, so you cannot chain a further follow-up off of an enrichment. **IMPORTANT:** Always include `--no-wait` so the command returns immediately instead of blocking. Parse the `--json` output to extract `taskgroup_id` and `url`. The output is `{taskgroup_id, url, num_runs}` — there is no `interaction_id` field, do not look for one. Immediately tell the user: - Enrichment has been kicked off - The monitoring URL where they can track progress Tell them they can background the polling step to continue working while it runs. ## Step 2: Poll for results Pick a concrete output path (e.g., `/tmp/enrichment-acme.json`). Note: the file is JSON regardless of the extension you choose — it's an array of `{input, output}` objects, not a CSV. Name it `.json` to avoid confusing yourself or the user. ```bash parallel-cli enrich poll "$TASKGROUP_ID" --timeout 540 --output "/tmp/enrichment-<descriptive-name>.json" ``` Important: - Use `--timeout 540` (9 minutes) to stay within tool execution limits - The `--target` from step 1 is unused in `--no-wai