
Langsmith Dataset
Create and maintain LangSmith evaluation datasets so solo builders can regression-test agents and RAG flows before shipping.
Overview
Langsmith-dataset is an agent skill most often used in Ship (also Build agent-tooling and Validate prototype) that creates, uploads, and manages LangSmith evaluation datasets via CLI and SDK.
Install
npx skills add https://github.com/langchain-ai/langsmith-skills --skill langsmith-datasetWhat is this skill?
- Covers four dataset types: final_response, single_step, trajectory, and RAG
- LangSmith CLI install plus dataset list/create/upload management commands
- Python (langsmith) and JavaScript SDK paths for programmatic dataset creation
- Example management and project-scoped workflows tied to LANGSMITH_PROJECT traces
- Requires LANGSMITH_API_KEY via env or --api-key on every CLI interaction
- Four dataset types documented: final_response, single_step, trajectory, and RAG
- Supports LangSmith CLI plus Python and JavaScript SDK creation paths
Adoption & trust: 2.2k installs on skills.sh; 130 GitHub stars; 2/3 security scanners passed (skills.sh audits).
What problem does it solve?
You have agent or RAG behavior you want to regression-test but no organized LangSmith dataset or consistent way to add and version examples.
Who is it for?
Indie builders standardizing LangSmith eval datasets for agents, chains, or RAG after you have a LangSmith project and API key.
Skip if: Teams that only need one-off manual chat tests, do not use LangSmith, or want generic unit tests with no LLM eval platform.
When should I use this skill?
Creating evaluation datasets, uploading datasets to LangSmith, or managing existing datasets and examples.
What do I get? / Deliverables
You end up with typed LangSmith datasets, managed examples, and CLI/SDK workflows ready to plug into eval runs and trace-backed debugging.
- LangSmith datasets with chosen type and managed examples
- Documented CLI/SDK commands for list, create, and upload workflows
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Evaluation datasets sit on the critical path to confident releases—canonical shelf is Ship because the skill’s one-liner targets testing and validation, not one-off prototyping. Testing is where you define examples, dataset types, and upload runs; LangSmith dataset CLI/SDK work is prep for eval harnesses and CI-style agent checks.
Where it fits
Seed a small final_response dataset from prototype outputs to decide if the agent is ready to build out fully.
Define trajectory and single_step datasets while wiring tools so future runs compare against the same golden paths.
Upload and version RAG QA pairs before release so prompt or retriever changes trigger repeatable LangSmith evals.
Append production failure cases into an existing dataset after reviewing LANGSMITH_PROJECT traces.
How it compares
Use this LangSmith dataset skill instead of scattering eval JSON in repo folders without a hosted eval store and trace linkage.
Common Questions / FAQ
Who is langsmith-dataset for?
Solo and indie builders using Claude Code, Cursor, or Codex who ship LLM or agent products and want LangSmith-backed evaluation datasets rather than informal prompt spot-checks.
When should I use langsmith-dataset?
During Ship testing when you formalize regression suites; during Build agent-tooling when you stand up eval infrastructure; and during Validate prototype when you prove behavior with small labeled example sets before a full build.
Is langsmith-dataset safe to install?
It expects a LangSmith API key and network access to LangSmith—treat keys as secrets and review the Security Audits panel on this Prism page before trusting the skill package in your environment.
SKILL.md
READMESKILL.md - Langsmith Dataset
<oneliner> Create, manage, and upload evaluation datasets to LangSmith for testing and validation. </oneliner> <setup> Environment Variables ```bash LANGSMITH_API_KEY=lsv2_pt_your_api_key_here # REQUIRED LANGSMITH_PROJECT=your-project-name # Check this to know which project has traces LANGSMITH_WORKSPACE_ID=your-workspace-id # Optional: for org-scoped keys ``` Authentication is REQUIRED: either set the `LANGSMITH_API_KEY` environment variable, or pass the `--api-key` flag to CLI commands (preferred): ```bash langsmith dataset list --api-key $LANGSMITH_API_KEY ``` **IMPORTANT:** Always check the environment variables or `.env` file for `LANGSMITH_PROJECT` before querying or interacting with LangSmith. This tells you which project contains the relevant traces and data. If the LangSmith project is not available, use your best judgement to identify the right one. Python Dependencies ```bash pip install langsmith ``` JavaScript Dependencies ```bash npm install langsmith ``` CLI Tool ```bash curl -sSL https://raw.githubusercontent.com/langchain-ai/langsmith-cli/main/scripts/install.sh | sh ``` </setup> <usage> Use the `langsmith` CLI to manage datasets and examples. ### Dataset Commands - `langsmith dataset list` - List datasets in LangSmith - `langsmith dataset get <name-or-id>` - View dataset details - `langsmith dataset create --name <name>` - Create a new empty dataset - `langsmith dataset delete <name-or-id>` - Delete a dataset - `langsmith dataset export <name-or-id> <output-file>` - Export dataset to local JSON file - `langsmith dataset upload <file> --name <name>` - Upload a local JSON file as a dataset ### Example Commands - `langsmith example list --dataset <name>` - List examples in a dataset - `langsmith example create --dataset <name> --inputs <json>` - Add an example to a dataset - `langsmith example delete <example-id>` - Delete an example ### Experiment Commands - `langsmith experiment list --dataset <name>` - List experiments for a dataset - `langsmith experiment get <name>` - View experiment results ### Common Flags - `--limit N` - Limit number of results - `--yes` - Skip confirmation prompts (use with caution) **IMPORTANT - Safety Prompts:** - The CLI prompts for confirmation before destructive operations (delete, overwrite) - **If you are running with user input:** ALWAYS wait for user input; NEVER use `--yes` unless the user explicitly requests it - **If you are running non-interactively:** Use `--yes` to skip confirmation prompts </usage> <dataset_types_overview> Common evaluation dataset types: - **final_response** - Full conversation with expected output. Tests complete agent behavior. - **single_step** - Single node inputs/outputs. Tests specific node behavior (e.g., one LLM call or tool). - **trajectory** - Tool call sequence. Tests execution path (ordered list of tool names). - **rag** - Question/chunks/answer/citations. Tests retrieval quality. </dataset_types_overview> <creating_datasets> ## Creating Datasets Datasets are JSON files with an array of examples. Each example has `inputs` and `outputs`. ### From Exported Traces (Programmatic) Export traces first, then process them into dataset format using code: ```bash # 1. Export traces to JSONL files langsmith trace export ./traces --project my-project --limit 20 --full --api-key $LANGSMITH_API_KEY ``` <python> ```python import json from pathlib import Path from langsmith import Client client = Client() # 2. Process traces into dataset examples examples = [] for jsonl_file in Path("./traces").glob("*.jsonl"): runs = [json.loads(line) for line in jsonl_file.re