Langsmith Dataset

Name: Langsmith Dataset
Author: langchain-ai

langchain-ai/langsmith-skills

Create and maintain LangSmith evaluation datasets so solo builders can regression-test agents and RAG flows before shipping.

Overview

Langsmith-dataset is an agent skill most often used in Ship (also Build agent-tooling and Validate prototype) that creates, uploads, and manages LangSmith evaluation datasets via CLI and SDK.

Install

npx skills add https://github.com/langchain-ai/langsmith-skills --skill langsmith-dataset

What is this skill?

Covers four dataset types: final_response, single_step, trajectory, and RAG
LangSmith CLI install plus dataset list/create/upload management commands
Python (langsmith) and JavaScript SDK paths for programmatic dataset creation
Example management and project-scoped workflows tied to LANGSMITH_PROJECT traces
Requires LANGSMITH_API_KEY via env or --api-key on every CLI interaction
Four dataset types documented: final_response, single_step, trajectory, and RAG
Supports LangSmith CLI plus Python and JavaScript SDK creation paths

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

Adoption & trust: 2.2k installs on skills.sh; 130 GitHub stars; 2/3 security scanners passed (skills.sh audits).

What problem does it solve?

You have agent or RAG behavior you want to regression-test but no organized LangSmith dataset or consistent way to add and version examples.

Who is it for?

Indie builders standardizing LangSmith eval datasets for agents, chains, or RAG after you have a LangSmith project and API key.

Skip if: Teams that only need one-off manual chat tests, do not use LangSmith, or want generic unit tests with no LLM eval platform.

When should I use this skill?

Creating evaluation datasets, uploading datasets to LangSmith, or managing existing datasets and examples.

What do I get? / Deliverables

You end up with typed LangSmith datasets, managed examples, and CLI/SDK workflows ready to plug into eval runs and trace-backed debugging.

LangSmith datasets with chosen type and managed examples
Documented CLI/SDK commands for list, create, and upload workflows

Recommended Skills

Microsoft Foundrymicrosoft/azure-skills

Microsoft Foundry skill guides agents through the full Azure AI Foundry lifecycle—containerizing agents, pushing to ACR,…377k installs·1.2k stars

Azure Aimicrosoft/azure-skills

azure-ai is a Prism-oriented quick reference for Microsoft Azure AI work, with the published body centered on the Azure …375k installs·1.2k stars

Azure Hosted Copilot Sdkmicrosoft/azure-skills

Azure Hosted Copilot SDK is Microsoft's entry skill for repos using @github/copilot-sdk—it detects CopilotClient usage, …346k installs·1.2k stars

Lark Eventlarksuite/cli

Lark real-time subscription skill via lark-cli event consume for building bots and streaming webhook-style agent workers…208k installs·13.7k stars

Running Claude Code Via Litellm Copilotxixu-me/skills

Running Claude Code via LiteLLM Copilot walks through pointing Claude Code at a local LiteLLM proxy that forwards Anthro…200k installs·61 stars

Setup Matt Pocock Skillsmattpocock/skills

One-time per-repo setup so Matt Pocock engineering skills share correct issue tracker, triage strings, and domain docume…180k installs·121k stars

Journey fit

Spans multiple journey phases - primary shelf plus alternate fits below.

Primary fit

Evaluation datasets sit on the critical path to confident releases—canonical shelf is Ship because the skill’s one-liner targets testing and validation, not one-off prototyping. Testing is where you define examples, dataset types, and upload runs; LangSmith dataset CLI/SDK work is prep for eval harnesses and CI-style agent checks.

Also useful

BuildAgent skills & templates

Also useful

ValidatePrototype & spike

Where it fits

Example use

ValidatePrototype & spike

Seed a small final_response dataset from prototype outputs to decide if the agent is ready to build out fully.

Example use

BuildAgent skills & templates

Define trajectory and single_step datasets while wiring tools so future runs compare against the same golden paths.

Example use

ShipTesting & QA

Upload and version RAG QA pairs before release so prompt or retriever changes trigger repeatable LangSmith evals.

Example use

OperateIteration & experiments

Append production failure cases into an existing dataset after reviewing LANGSMITH_PROJECT traces.

How it compares

Use this LangSmith dataset skill instead of scattering eval JSON in repo folders without a hosted eval store and trace linkage.

Common Questions / FAQ

Who is langsmith-dataset for?

Solo and indie builders using Claude Code, Cursor, or Codex who ship LLM or agent products and want LangSmith-backed evaluation datasets rather than informal prompt spot-checks.

When should I use langsmith-dataset?

During Ship testing when you formalize regression suites; during Build agent-tooling when you stand up eval infrastructure; and during Validate prototype when you prove behavior with small labeled example sets before a full build.

Is langsmith-dataset safe to install?

It expects a LangSmith API key and network access to LangSmith—treat keys as secrets and review the Security Audits panel on this Prism page before trusting the skill package in your environment.

SKILL.md

READMESKILL.md - Langsmith Dataset

<oneliner>
Create, manage, and upload evaluation datasets to LangSmith for testing and validation.
</oneliner>

<setup>
Environment Variables

```bash
LANGSMITH_API_KEY=lsv2_pt_your_api_key_here          # REQUIRED
LANGSMITH_PROJECT=your-project-name                   # Check this to know which project has traces
LANGSMITH_WORKSPACE_ID=your-workspace-id              # Optional: for org-scoped keys
```

Authentication is REQUIRED: either set the `LANGSMITH_API_KEY` environment variable, or pass the `--api-key` flag to CLI commands (preferred):
```bash
langsmith dataset list --api-key $LANGSMITH_API_KEY
```

**IMPORTANT:** Always check the environment variables or `.env` file for `LANGSMITH_PROJECT` before querying or interacting with LangSmith. This tells you which project contains the relevant traces and data. If the LangSmith project is not available, use your best judgement to identify the right one.

Python Dependencies
```bash
pip install langsmith
```

JavaScript Dependencies
```bash
npm install langsmith
```

CLI Tool

```bash
curl -sSL https://raw.githubusercontent.com/langchain-ai/langsmith-cli/main/scripts/install.sh | sh
```
</setup>

<usage>
Use the `langsmith` CLI to manage datasets and examples.

### Dataset Commands

- `langsmith dataset list` - List datasets in LangSmith
- `langsmith dataset get <name-or-id>` - View dataset details
- `langsmith dataset create --name <name>` - Create a new empty dataset
- `langsmith dataset delete <name-or-id>` - Delete a dataset
- `langsmith dataset export <name-or-id> <output-file>` - Export dataset to local JSON file
- `langsmith dataset upload <file> --name <name>` - Upload a local JSON file as a dataset

### Example Commands

- `langsmith example list --dataset <name>` - List examples in a dataset
- `langsmith example create --dataset <name> --inputs <json>` - Add an example to a dataset
- `langsmith example delete <example-id>` - Delete an example

### Experiment Commands

- `langsmith experiment list --dataset <name>` - List experiments for a dataset
- `langsmith experiment get <name>` - View experiment results

### Common Flags

- `--limit N` - Limit number of results
- `--yes` - Skip confirmation prompts (use with caution)

**IMPORTANT - Safety Prompts:**
- The CLI prompts for confirmation before destructive operations (delete, overwrite)
- **If you are running with user input:** ALWAYS wait for user input; NEVER use `--yes` unless the user explicitly requests it
- **If you are running non-interactively:** Use `--yes` to skip confirmation prompts
</usage>

<dataset_types_overview>
Common evaluation dataset types:

- **final_response** - Full conversation with expected output. Tests complete agent behavior.
- **single_step** - Single node inputs/outputs. Tests specific node behavior (e.g., one LLM call or tool).
- **trajectory** - Tool call sequence. Tests execution path (ordered list of tool names).
- **rag** - Question/chunks/answer/citations. Tests retrieval quality.
</dataset_types_overview>

<creating_datasets>
## Creating Datasets

Datasets are JSON files with an array of examples. Each example has `inputs` and `outputs`.

### From Exported Traces (Programmatic)

Export traces first, then process them into dataset format using code:

```bash
# 1. Export traces to JSONL files
langsmith trace export ./traces --project my-project --limit 20 --full --api-key $LANGSMITH_API_KEY
```

<python>
```python
import json
from pathlib import Path
from langsmith import Client

client = Client()

# 2. Process traces into dataset examples
examples = []
for jsonl_file in Path("./traces").glob("*.jsonl"):
    runs = [json.loads(line) for line in jsonl_file.re

What is this skill?

Covers four dataset types: final_response, single_step, trajectory, and RAG

LangSmith CLI install plus dataset list/create/upload management commands

Python (langsmith) and JavaScript SDK paths for programmatic dataset creation

Example management and project-scoped workflows tied to LANGSMITH_PROJECT traces

Requires LANGSMITH_API_KEY via env or --api-key on every CLI interaction

Four dataset types documented: final_response, single_step, trajectory, and RAG

Supports LangSmith CLI plus Python and JavaScript SDK creation paths

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

Adoption & trust: 2.2k installs on skills.sh; 130 GitHub stars; 2/3 security scanners passed (skills.sh audits).

Journey fit

Spans multiple journey phases - primary shelf plus alternate fits below.

Primary fit

Also useful

BuildAgent skills & templates

Also useful

ValidatePrototype & spike

Where it fits

Example use

ValidatePrototype & spike

Seed a small final_response dataset from prototype outputs to decide if the agent is ready to build out fully.

Example use

BuildAgent skills & templates

Define trajectory and single_step datasets while wiring tools so future runs compare against the same golden paths.

Example use

ShipTesting & QA

Upload and version RAG QA pairs before release so prompt or retriever changes trigger repeatable LangSmith evals.

Example use

OperateIteration & experiments

Append production failure cases into an existing dataset after reviewing LANGSMITH_PROJECT traces.

SKILL.md

READMESKILL.md - Langsmith Dataset

<oneliner>
Create, manage, and upload evaluation datasets to LangSmith for testing and validation.
</oneliner>

<setup>
Environment Variables

```bash
LANGSMITH_API_KEY=lsv2_pt_your_api_key_here          # REQUIRED
LANGSMITH_PROJECT=your-project-name                   # Check this to know which project has traces
LANGSMITH_WORKSPACE_ID=your-workspace-id              # Optional: for org-scoped keys
```

Authentication is REQUIRED: either set the `LANGSMITH_API_KEY` environment variable, or pass the `--api-key` flag to CLI commands (preferred):
```bash
langsmith dataset list --api-key $LANGSMITH_API_KEY
```

**IMPORTANT:** Always check the environment variables or `.env` file for `LANGSMITH_PROJECT` before querying or interacting with LangSmith. This tells you which project contains the relevant traces and data. If the LangSmith project is not available, use your best judgement to identify the right one.

Python Dependencies
```bash
pip install langsmith
```

JavaScript Dependencies
```bash
npm install langsmith
```

CLI Tool

```bash
curl -sSL https://raw.githubusercontent.com/langchain-ai/langsmith-cli/main/scripts/install.sh | sh
```
</setup>

<usage>
Use the `langsmith` CLI to manage datasets and examples.

### Dataset Commands

- `langsmith dataset list` - List datasets in LangSmith
- `langsmith dataset get <name-or-id>` - View dataset details
- `langsmith dataset create --name <name>` - Create a new empty dataset
- `langsmith dataset delete <name-or-id>` - Delete a dataset
- `langsmith dataset export <name-or-id> <output-file>` - Export dataset to local JSON file
- `langsmith dataset upload <file> --name <name>` - Upload a local JSON file as a dataset

### Example Commands

- `langsmith example list --dataset <name>` - List examples in a dataset
- `langsmith example create --dataset <name> --inputs <json>` - Add an example to a dataset
- `langsmith example delete <example-id>` - Delete an example

### Experiment Commands

- `langsmith experiment list --dataset <name>` - List experiments for a dataset
- `langsmith experiment get <name>` - View experiment results

### Common Flags

- `--limit N` - Limit number of results
- `--yes` - Skip confirmation prompts (use with caution)

**IMPORTANT - Safety Prompts:**
- The CLI prompts for confirmation before destructive operations (delete, overwrite)
- **If you are running with user input:** ALWAYS wait for user input; NEVER use `--yes` unless the user explicitly requests it
- **If you are running non-interactively:** Use `--yes` to skip confirmation prompts
</usage>

<dataset_types_overview>
Common evaluation dataset types:

- **final_response** - Full conversation with expected output. Tests complete agent behavior.
- **single_step** - Single node inputs/outputs. Tests specific node behavior (e.g., one LLM call or tool).
- **trajectory** - Tool call sequence. Tests execution path (ordered list of tool names).
- **rag** - Question/chunks/answer/citations. Tests retrieval quality.
</dataset_types_overview>

<creating_datasets>
## Creating Datasets

Datasets are JSON files with an array of examples. Each example has `inputs` and `outputs`.

### From Exported Traces (Programmatic)

Export traces first, then process them into dataset format using code:

```bash
# 1. Export traces to JSONL files
langsmith trace export ./traces --project my-project --limit 20 --full --api-key $LANGSMITH_API_KEY
```

<python>
```python
import json
from pathlib import Path
from langsmith import Client

client = Client()

# 2. Process traces into dataset examples
examples = []
for jsonl_file in Path("./traces").glob("*.jsonl"):
    runs = [json.loads(line) for line in jsonl_file.re

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Where it fits

Who is langsmith-dataset for?

When should I use langsmith-dataset?

Is langsmith-dataset safe to install?

SKILL.md

This week for builders

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Where it fits

Who is langsmith-dataset for?

When should I use langsmith-dataset?

Is langsmith-dataset safe to install?

SKILL.md