Install
Terminal · npx
$npx skills add https://github.com/langchain-ai/langsmith-skills --skill langsmith-dataset
Works with Paperclip
How Langsmith Dataset fits into a Paperclip company.

Langsmith Dataset drops into any Paperclip agent that handles this kind of work. Assign it to a specialist inside a pre-configured PaperclipOrg company and the skill becomes available on every heartbeat — no prompt engineering, no tool wiring.
SaaS FactoryPaired
Pre-configured AI company — 18 agents, 18 skills, one-time purchase.
$27$59
Explore pack
Source file
SKILL.md300 linesmarkdown
Expand
1---2name: langsmith-dataset3description: "INVOKE THIS SKILL when creating evaluation datasets, uploading datasets to LangSmith, or managing existing datasets. Covers dataset types (final_response, single_step, trajectory, RAG), CLI management commands, SDK-based creation, and example management. Uses the langsmith CLI tool."4---5 6<oneliner>7Create, manage, and upload evaluation datasets to LangSmith for testing and validation.8</oneliner>9 10<setup>11Environment Variables12 13```bash14LANGSMITH_API_KEY=lsv2_pt_your_api_key_here          # REQUIRED15LANGSMITH_PROJECT=your-project-name                   # Check this to know which project has traces16LANGSMITH_WORKSPACE_ID=your-workspace-id              # Optional: for org-scoped keys17```18 19Authentication is REQUIRED: either set the `LANGSMITH_API_KEY` environment variable, or pass the `--api-key` flag to CLI commands (preferred):20```bash21langsmith dataset list --api-key $LANGSMITH_API_KEY22```23 24**IMPORTANT:** Always check the environment variables or `.env` file for `LANGSMITH_PROJECT` before querying or interacting with LangSmith. This tells you which project contains the relevant traces and data. If the LangSmith project is not available, use your best judgement to identify the right one.25 26Python Dependencies27```bash28pip install langsmith29```30 31JavaScript Dependencies32```bash33npm install langsmith34```35 36CLI Tool37 38```bash39curl -sSL https://raw.githubusercontent.com/langchain-ai/langsmith-cli/main/scripts/install.sh | sh40```41</setup>42 43<usage>44Use the `langsmith` CLI to manage datasets and examples.45 46### Dataset Commands47 48- `langsmith dataset list` - List datasets in LangSmith49- `langsmith dataset get <name-or-id>` - View dataset details50- `langsmith dataset create --name <name>` - Create a new empty dataset51- `langsmith dataset delete <name-or-id>` - Delete a dataset52- `langsmith dataset export <name-or-id> <output-file>` - Export dataset to local JSON file53- `langsmith dataset upload <file> --name <name>` - Upload a local JSON file as a dataset54 55### Example Commands56 57- `langsmith example list --dataset <name>` - List examples in a dataset58- `langsmith example create --dataset <name> --inputs <json>` - Add an example to a dataset59- `langsmith example delete <example-id>` - Delete an example60 61### Experiment Commands62 63- `langsmith experiment list --dataset <name>` - List experiments for a dataset64- `langsmith experiment get <name>` - View experiment results65 66### Common Flags67 68- `--limit N` - Limit number of results69- `--yes` - Skip confirmation prompts (use with caution)70 71**IMPORTANT - Safety Prompts:**72- The CLI prompts for confirmation before destructive operations (delete, overwrite)73- **If you are running with user input:** ALWAYS wait for user input; NEVER use `--yes` unless the user explicitly requests it74- **If you are running non-interactively:** Use `--yes` to skip confirmation prompts75</usage>76 77<dataset_types_overview>78Common evaluation dataset types:79 80- **final_response** - Full conversation with expected output. Tests complete agent behavior.81- **single_step** - Single node inputs/outputs. Tests specific node behavior (e.g., one LLM call or tool).82- **trajectory** - Tool call sequence. Tests execution path (ordered list of tool names).83- **rag** - Question/chunks/answer/citations. Tests retrieval quality.84</dataset_types_overview>85 86<creating_datasets>87## Creating Datasets88 89Datasets are JSON files with an array of examples. Each example has `inputs` and `outputs`.90 91### From Exported Traces (Programmatic)92 93Export traces first, then process them into dataset format using code:94 95```bash96# 1. Export traces to JSONL files97langsmith trace export ./traces --project my-project --limit 20 --full --api-key $LANGSMITH_API_KEY98```99 100<python>101```python102import json103from pathlib import Path104from langsmith import Client105 106client = Client()107 108# 2. Process traces into dataset examples109examples = []110for jsonl_file in Path("./traces").glob("*.jsonl"):111    runs = [json.loads(line) for line in jsonl_file.read_text().strip().split("\n")]112    root = next((r for r in runs if r.get("parent_run_id") is None), None)113    if root and root.get("inputs") and root.get("outputs"):114        examples.append({115            "trace_id": root.get("trace_id"),116            "inputs": root["inputs"],117            "outputs": root["outputs"]118        })119 120# 3. Save locally121with open("/tmp/dataset.json", "w") as f:122    json.dump(examples, f, indent=2)123```124</python>125 126<typescript>127```typescript128import { Client } from "langsmith";129import { readFileSync, writeFileSync, readdirSync } from "fs";130import { join } from "path";131 132const client = new Client();133 134// 2. Process traces into dataset examples135const examples: Array<{trace_id?: string, inputs: Record<string, any>, outputs: Record<string, any>}> = [];136const files = readdirSync("./traces").filter(f => f.endsWith(".jsonl"));137 138for (const file of files) {139  const lines = readFileSync(join("./traces", file), "utf-8").trim().split("\n");140  const runs = lines.map(line => JSON.parse(line));141  const root = runs.find(r => r.parent_run_id == null);142  if (root?.inputs && root?.outputs) {143    examples.push({ trace_id: root.trace_id, inputs: root.inputs, outputs: root.outputs });144  }145}146 147// 3. Save locally148writeFileSync("/tmp/dataset.json", JSON.stringify(examples, null, 2));149```150</typescript>151 152### Upload to LangSmith153 154```bash155# Upload local JSON file as a dataset156langsmith dataset upload /tmp/dataset.json --name "My Evaluation Dataset" --api-key $LANGSMITH_API_KEY157```158 159### Using the SDK Directly160 161<python>162```python163from langsmith import Client164 165client = Client()166 167# Create dataset and add examples in one step168dataset = client.create_dataset("My Dataset", description="Evaluation dataset")169 170client.create_examples(171    inputs=[{"query": "What is AI?"}, {"query": "Explain RAG"}],172    outputs=[{"answer": "AI is..."}, {"answer": "RAG is..."}],173    dataset_name="My Dataset",174)175```176</python>177 178<typescript>179```typescript180import { Client } from "langsmith";181 182const client = new Client();183 184// Create dataset and add examples185const dataset = await client.createDataset("My Dataset", {186  description: "Evaluation dataset",187});188 189await client.createExamples({190  inputs: [{ query: "What is AI?" }, { query: "Explain RAG" }],191  outputs: [{ answer: "AI is..." }, { answer: "RAG is..." }],192  datasetName: "My Dataset",193});194```195</typescript>196</creating_datasets>197 198<dataset_structures>199## Dataset Structures by Type200 201### Final Response202```json203{"trace_id": "...", "inputs": {"query": "What are the top genres?"}, "outputs": {"response": "The top genres are..."}}204```205 206### Single Step207```json208{"trace_id": "...", "inputs": {"messages": [...]}, "outputs": {"content": "..."}, "metadata": {"node_name": "model"}}209```210 211### Trajectory212```json213{"trace_id": "...", "inputs": {"query": "..."}, "outputs": {"expected_trajectory": ["tool_a", "tool_b", "tool_c"]}}214```215 216### RAG217```json218{"trace_id": "...", "inputs": {"question": "How do I..."}, "outputs": {"answer": "...", "retrieved_chunks": ["..."], "cited_chunks": ["..."]}}219```220</dataset_structures>221 222<script_usage>223## CLI Usage224 225```bash226# List all datasets227langsmith dataset list --api-key $LANGSMITH_API_KEY228 229# Get dataset details230langsmith dataset get "My Dataset" --api-key $LANGSMITH_API_KEY231 232# Create an empty dataset233langsmith dataset create --name "New Dataset" --description "For evaluation" --api-key $LANGSMITH_API_KEY234 235# Upload a local JSON file236langsmith dataset upload /tmp/dataset.json --name "My Dataset" --api-key $LANGSMITH_API_KEY237 238# Export a dataset to local file239langsmith dataset export "My Dataset" /tmp/exported.json --limit 100 --api-key $LANGSMITH_API_KEY240 241# Delete a dataset242langsmith dataset delete "My Dataset" --api-key $LANGSMITH_API_KEY243 244# List examples in a dataset245langsmith example list --dataset "My Dataset" --limit 10 --api-key $LANGSMITH_API_KEY246 247# Add an example248langsmith example create --dataset "My Dataset" \249  --inputs '{"query": "test"}' \250  --outputs '{"answer": "result"}' --api-key $LANGSMITH_API_KEY251 252# List experiments253langsmith experiment list --dataset "My Dataset" --api-key $LANGSMITH_API_KEY254langsmith experiment get "eval-v1" --api-key $LANGSMITH_API_KEY255```256</script_usage>257 258<example_workflow>259Complete workflow from traces to uploaded LangSmith dataset:260 261```bash262# 1. Export traces from LangSmith263langsmith trace export ./traces --project my-project --limit 20 --full --api-key $LANGSMITH_API_KEY264 265# 2. Process traces into dataset format (using Python/JS code)266# See "Creating Datasets" section above267 268# 3. Upload to LangSmith269langsmith dataset upload /tmp/final_response.json --name "Skills: Final Response" --api-key $LANGSMITH_API_KEY270langsmith dataset upload /tmp/trajectory.json --name "Skills: Trajectory" --api-key $LANGSMITH_API_KEY271 272# 4. Verify upload273langsmith dataset list --api-key $LANGSMITH_API_KEY274langsmith dataset get "Skills: Final Response" --api-key $LANGSMITH_API_KEY275langsmith example list --dataset "Skills: Final Response" --limit 3 --api-key $LANGSMITH_API_KEY276 277# 5. Run experiments278langsmith experiment list --dataset "Skills: Final Response" --api-key $LANGSMITH_API_KEY279```280</example_workflow>281 282<troubleshooting>283**Dataset upload fails:**284- Verify LANGSMITH_API_KEY is set285- Check JSON file is valid: each element needs `inputs` (and optionally `outputs`)286- Dataset name must be unique, or delete existing first with `langsmith dataset delete`287 288**Empty dataset after upload:**289- Verify JSON file contains an array of objects with `inputs` key290- Check file isn't empty: `langsmith example list --dataset "Name"`291 292**Export has no data:**293- Ensure traces were exported with `--full` flag to include inputs/outputs294- Verify traces have both `inputs` and `outputs` populated295 296**Example count mismatch:**297- Use `langsmith dataset get "Name"` to check remote count298- Compare with local file to verify upload completeness299</troubleshooting>300</output>
Related skills
Langsmith Evaluator

Install Langsmith Evaluator skill for Claude Code from langchain-ai/langsmith-skills.
Langsmith Trace

Install Langsmith Trace skill for Claude Code from langchain-ai/langsmith-skills.
1password

Install 1password skill for Claude Code from steipete/clawdis.