Claude Agent Skill · by Langchain Ai

Langsmith Dataset

Install Langsmith Dataset skill for Claude Code from langchain-ai/langsmith-skills.

Install
Terminal · npx
$npx skills add https://github.com/langchain-ai/langsmith-skills --skill langsmith-dataset
Works with Paperclip

How Langsmith Dataset fits into a Paperclip company.

Langsmith Dataset drops into any Paperclip agent that handles this kind of work. Assign it to a specialist inside a pre-configured PaperclipOrg company and the skill becomes available on every heartbeat — no prompt engineering, no tool wiring.

S
SaaS FactoryPaired

Pre-configured AI company — 18 agents, 18 skills, one-time purchase.

$27$59
Explore pack
Source file
SKILL.md300 lines
Expand
---name: langsmith-datasetdescription: "INVOKE THIS SKILL when creating evaluation datasets, uploading datasets to LangSmith, or managing existing datasets. Covers dataset types (final_response, single_step, trajectory, RAG), CLI management commands, SDK-based creation, and example management. Uses the langsmith CLI tool."--- <oneliner>Create, manage, and upload evaluation datasets to LangSmith for testing and validation.</oneliner> <setup>Environment Variables ```bashLANGSMITH_API_KEY=lsv2_pt_your_api_key_here          # REQUIREDLANGSMITH_PROJECT=your-project-name                   # Check this to know which project has tracesLANGSMITH_WORKSPACE_ID=your-workspace-id              # Optional: for org-scoped keys``` Authentication is REQUIRED: either set the `LANGSMITH_API_KEY` environment variable, or pass the `--api-key` flag to CLI commands (preferred):```bashlangsmith dataset list --api-key $LANGSMITH_API_KEY``` **IMPORTANT:** Always check the environment variables or `.env` file for `LANGSMITH_PROJECT` before querying or interacting with LangSmith. This tells you which project contains the relevant traces and data. If the LangSmith project is not available, use your best judgement to identify the right one. Python Dependencies```bashpip install langsmith``` JavaScript Dependencies```bashnpm install langsmith``` CLI Tool ```bashcurl -sSL https://raw.githubusercontent.com/langchain-ai/langsmith-cli/main/scripts/install.sh | sh```</setup> <usage>Use the `langsmith` CLI to manage datasets and examples. ### Dataset Commands - `langsmith dataset list` - List datasets in LangSmith- `langsmith dataset get <name-or-id>` - View dataset details- `langsmith dataset create --name <name>` - Create a new empty dataset- `langsmith dataset delete <name-or-id>` - Delete a dataset- `langsmith dataset export <name-or-id> <output-file>` - Export dataset to local JSON file- `langsmith dataset upload <file> --name <name>` - Upload a local JSON file as a dataset ### Example Commands - `langsmith example list --dataset <name>` - List examples in a dataset- `langsmith example create --dataset <name> --inputs <json>` - Add an example to a dataset- `langsmith example delete <example-id>` - Delete an example ### Experiment Commands - `langsmith experiment list --dataset <name>` - List experiments for a dataset- `langsmith experiment get <name>` - View experiment results ### Common Flags - `--limit N` - Limit number of results- `--yes` - Skip confirmation prompts (use with caution) **IMPORTANT - Safety Prompts:**- The CLI prompts for confirmation before destructive operations (delete, overwrite)- **If you are running with user input:** ALWAYS wait for user input; NEVER use `--yes` unless the user explicitly requests it- **If you are running non-interactively:** Use `--yes` to skip confirmation prompts</usage> <dataset_types_overview>Common evaluation dataset types: - **final_response** - Full conversation with expected output. Tests complete agent behavior.- **single_step** - Single node inputs/outputs. Tests specific node behavior (e.g., one LLM call or tool).- **trajectory** - Tool call sequence. Tests execution path (ordered list of tool names).- **rag** - Question/chunks/answer/citations. Tests retrieval quality.</dataset_types_overview> <creating_datasets>## Creating Datasets Datasets are JSON files with an array of examples. Each example has `inputs` and `outputs`. ### From Exported Traces (Programmatic) Export traces first, then process them into dataset format using code: ```bash# 1. Export traces to JSONL fileslangsmith trace export ./traces --project my-project --limit 20 --full --api-key $LANGSMITH_API_KEY``` <python>```pythonimport jsonfrom pathlib import Pathfrom langsmith import Client client = Client() # 2. Process traces into dataset examplesexamples = []for jsonl_file in Path("./traces").glob("*.jsonl"):    runs = [json.loads(line) for line in jsonl_file.read_text().strip().split("\n")]    root = next((r for r in runs if r.get("parent_run_id") is None), None)    if root and root.get("inputs") and root.get("outputs"):        examples.append({            "trace_id": root.get("trace_id"),            "inputs": root["inputs"],            "outputs": root["outputs"]        }) # 3. Save locallywith open("/tmp/dataset.json", "w") as f:    json.dump(examples, f, indent=2)```</python> <typescript>```typescriptimport { Client } from "langsmith";import { readFileSync, writeFileSync, readdirSync } from "fs";import { join } from "path"; const client = new Client(); // 2. Process traces into dataset examplesconst examples: Array<{trace_id?: string, inputs: Record<string, any>, outputs: Record<string, any>}> = [];const files = readdirSync("./traces").filter(f => f.endsWith(".jsonl")); for (const file of files) {  const lines = readFileSync(join("./traces", file), "utf-8").trim().split("\n");  const runs = lines.map(line => JSON.parse(line));  const root = runs.find(r => r.parent_run_id == null);  if (root?.inputs && root?.outputs) {    examples.push({ trace_id: root.trace_id, inputs: root.inputs, outputs: root.outputs });  }} // 3. Save locallywriteFileSync("/tmp/dataset.json", JSON.stringify(examples, null, 2));```</typescript> ### Upload to LangSmith ```bash# Upload local JSON file as a datasetlangsmith dataset upload /tmp/dataset.json --name "My Evaluation Dataset" --api-key $LANGSMITH_API_KEY``` ### Using the SDK Directly <python>```pythonfrom langsmith import Client client = Client() # Create dataset and add examples in one stepdataset = client.create_dataset("My Dataset", description="Evaluation dataset") client.create_examples(    inputs=[{"query": "What is AI?"}, {"query": "Explain RAG"}],    outputs=[{"answer": "AI is..."}, {"answer": "RAG is..."}],    dataset_name="My Dataset",)```</python> <typescript>```typescriptimport { Client } from "langsmith"; const client = new Client(); // Create dataset and add examplesconst dataset = await client.createDataset("My Dataset", {  description: "Evaluation dataset",}); await client.createExamples({  inputs: [{ query: "What is AI?" }, { query: "Explain RAG" }],  outputs: [{ answer: "AI is..." }, { answer: "RAG is..." }],  datasetName: "My Dataset",});```</typescript></creating_datasets> <dataset_structures>## Dataset Structures by Type ### Final Response```json{"trace_id": "...", "inputs": {"query": "What are the top genres?"}, "outputs": {"response": "The top genres are..."}}``` ### Single Step```json{"trace_id": "...", "inputs": {"messages": [...]}, "outputs": {"content": "..."}, "metadata": {"node_name": "model"}}``` ### Trajectory```json{"trace_id": "...", "inputs": {"query": "..."}, "outputs": {"expected_trajectory": ["tool_a", "tool_b", "tool_c"]}}``` ### RAG```json{"trace_id": "...", "inputs": {"question": "How do I..."}, "outputs": {"answer": "...", "retrieved_chunks": ["..."], "cited_chunks": ["..."]}}```</dataset_structures> <script_usage>## CLI Usage ```bash# List all datasetslangsmith dataset list --api-key $LANGSMITH_API_KEY # Get dataset detailslangsmith dataset get "My Dataset" --api-key $LANGSMITH_API_KEY # Create an empty datasetlangsmith dataset create --name "New Dataset" --description "For evaluation" --api-key $LANGSMITH_API_KEY # Upload a local JSON filelangsmith dataset upload /tmp/dataset.json --name "My Dataset" --api-key $LANGSMITH_API_KEY # Export a dataset to local filelangsmith dataset export "My Dataset" /tmp/exported.json --limit 100 --api-key $LANGSMITH_API_KEY # Delete a datasetlangsmith dataset delete "My Dataset" --api-key $LANGSMITH_API_KEY # List examples in a datasetlangsmith example list --dataset "My Dataset" --limit 10 --api-key $LANGSMITH_API_KEY # Add an examplelangsmith example create --dataset "My Dataset" \  --inputs '{"query": "test"}' \  --outputs '{"answer": "result"}' --api-key $LANGSMITH_API_KEY # List experimentslangsmith experiment list --dataset "My Dataset" --api-key $LANGSMITH_API_KEYlangsmith experiment get "eval-v1" --api-key $LANGSMITH_API_KEY```</script_usage> <example_workflow>Complete workflow from traces to uploaded LangSmith dataset: ```bash# 1. Export traces from LangSmithlangsmith trace export ./traces --project my-project --limit 20 --full --api-key $LANGSMITH_API_KEY # 2. Process traces into dataset format (using Python/JS code)# See "Creating Datasets" section above # 3. Upload to LangSmithlangsmith dataset upload /tmp/final_response.json --name "Skills: Final Response" --api-key $LANGSMITH_API_KEYlangsmith dataset upload /tmp/trajectory.json --name "Skills: Trajectory" --api-key $LANGSMITH_API_KEY # 4. Verify uploadlangsmith dataset list --api-key $LANGSMITH_API_KEYlangsmith dataset get "Skills: Final Response" --api-key $LANGSMITH_API_KEYlangsmith example list --dataset "Skills: Final Response" --limit 3 --api-key $LANGSMITH_API_KEY # 5. Run experimentslangsmith experiment list --dataset "Skills: Final Response" --api-key $LANGSMITH_API_KEY```</example_workflow> <troubleshooting>**Dataset upload fails:**- Verify LANGSMITH_API_KEY is set- Check JSON file is valid: each element needs `inputs` (and optionally `outputs`)- Dataset name must be unique, or delete existing first with `langsmith dataset delete` **Empty dataset after upload:**- Verify JSON file contains an array of objects with `inputs` key- Check file isn't empty: `langsmith example list --dataset "Name"` **Export has no data:**- Ensure traces were exported with `--full` flag to include inputs/outputs- Verify traces have both `inputs` and `outputs` populated **Example count mismatch:**- Use `langsmith dataset get "Name"` to check remote count- Compare with local file to verify upload completeness</troubleshooting></output>