How Data Context Extractor fits into a Paperclip company.

Data Context Extractor drops into any Paperclip agent that handles this kind of work. Assign it to a specialist inside a pre-configured PaperclipOrg company and the skill becomes available on every heartbeat — no prompt engineering, no tool wiring.

SaaS FactoryPaired

Pre-configured AI company — 18 agents, 18 skills, one-time purchase.

$27$59

Explore pack

Source file

SKILL.md227 linesmarkdown

Expand

1---2name: data-context-extractor3description: >4  Generate or improve a company-specific data analysis skill by extracting tribal knowledge from analysts.5 6  BOOTSTRAP MODE - Triggers: "Create a data context skill", "Set up data analysis for our warehouse",7  "Help me create a skill for our database", "Generate a data skill for [company]"8  → Discovers schemas, asks key questions, generates initial skill with reference files9 10  ITERATION MODE - Triggers: "Add context about [domain]", "The skill needs more info about [topic]",11  "Update the data skill with [metrics/tables/terminology]", "Improve the [domain] reference"12  → Loads existing skill, asks targeted questions, appends/updates reference files13 14  Use when data analysts want Claude to understand their company's specific data warehouse,15  terminology, metrics definitions, and common query patterns.16---17 18# Data Context Extractor19 20A meta-skill that extracts company-specific data knowledge from analysts and generates tailored data analysis skills.21 22## How It Works23 24This skill has two modes:25 261. **Bootstrap Mode**: Create a new data analysis skill from scratch272. **Iteration Mode**: Improve an existing skill by adding domain-specific reference files28 29---30 31## Bootstrap Mode32 33Use when: User wants to create a new data context skill for their warehouse.34 35### Phase 1: Database Connection & Discovery36 37**Step 1: Identify the database type**38 39Ask: "What data warehouse are you using?"40 41Common options:42- **BigQuery**43- **Snowflake**44- **PostgreSQL/Redshift**45- **Databricks**46 47Use `~~data warehouse` tools (query and schema) to connect. If unclear, check available MCP tools in the current session.48 49**Step 2: Explore the schema**50 51Use `~~data warehouse` schema tools to:521. List available datasets/schemas532. Identify the most important tables (ask user: "Which 3-5 tables do analysts query most often?")543. Pull schema details for those key tables55 56Sample exploration queries by dialect:57```sql58-- BigQuery: List datasets59SELECT schema_name FROM INFORMATION_SCHEMA.SCHEMATA60 61-- BigQuery: List tables in a dataset62SELECT table_name FROM `project.dataset.INFORMATION_SCHEMA.TABLES`63 64-- Snowflake: List schemas65SHOW SCHEMAS IN DATABASE my_database66 67-- Snowflake: List tables68SHOW TABLES IN SCHEMA my_schema69```70 71### Phase 2: Core Questions (Ask These)72 73After schema discovery, ask these questions conversationally (not all at once):74 75**Entity Disambiguation (Critical)**76> "When people here say 'user' or 'customer', what exactly do they mean? Are there different types?"77 78Listen for:79- Multiple entity types (user vs account vs organization)80- Relationships between them (1:1, 1:many, many:many)81- Which ID fields link them together82 83**Primary Identifiers**84> "What's the main identifier for a [customer/user/account]? Are there multiple IDs for the same entity?"85 86Listen for:87- Primary keys vs business keys88- UUID vs integer IDs89- Legacy ID systems90 91**Key Metrics**92> "What are the 2-3 metrics people ask about most? How is each one calculated?"93 94Listen for:95- Exact formulas (ARR = monthly_revenue × 12)96- Which tables/columns feed each metric97- Time period conventions (trailing 7 days, calendar month, etc.)98 99**Data Hygiene**100> "What should ALWAYS be filtered out of queries? (test data, fraud, internal users, etc.)"101 102Listen for:103- Standard WHERE clauses to always include104- Flag columns that indicate exclusions (is_test, is_internal, is_fraud)105- Specific values to exclude (status = 'deleted')106 107**Common Gotchas**108> "What mistakes do new analysts typically make with this data?"109 110Listen for:111- Confusing column names112- Timezone issues113- NULL handling quirks114- Historical vs current state tables115 116### Phase 3: Generate the Skill117 118Create a skill with this structure:119 120```121[company]-data-analyst/122├── SKILL.md123└── references/124    ├── entities.md          # Entity definitions and relationships125    ├── metrics.md           # KPI calculations126    ├── tables/              # One file per domain127    │   ├── [domain1].md128    │   └── [domain2].md129    └── dashboards.json      # Optional: existing dashboards catalog130```131 132**SKILL.md Template**: See `references/skill-template.md`133 134**SQL Dialect Section**: See `references/sql-dialects.md` and include the appropriate dialect notes.135 136**Reference File Template**: See `references/domain-template.md`137 138### Phase 4: Package and Deliver139 1401. Create all files in the skill directory1412. Package as a zip file1423. Present to user with summary of what was captured143 144---145 146## Iteration Mode147 148Use when: User has an existing skill but needs to add more context.149 150### Step 1: Load Existing Skill151 152Ask user to upload their existing skill (zip or folder), or locate it if already in the session.153 154Read the current SKILL.md and reference files to understand what's already documented.155 156### Step 2: Identify the Gap157 158Ask: "What domain or topic needs more context? What queries are failing or producing wrong results?"159 160Common gaps:161- A new data domain (marketing, finance, product, etc.)162- Missing metric definitions163- Undocumented table relationships164- New terminology165 166### Step 3: Targeted Discovery167 168For the identified domain:169 1701. **Explore relevant tables**: Use `~~data warehouse` schema tools to find tables in that domain1712. **Ask domain-specific questions**:172   - "What tables are used for [domain] analysis?"173   - "What are the key metrics for [domain]?"174   - "Any special filters or gotchas for [domain] data?"175 1763. **Generate new reference file**: Create `references/[domain].md` using the domain template177 178### Step 4: Update and Repackage179 1801. Add the new reference file1812. Update SKILL.md's "Knowledge Base Navigation" section to include the new domain1823. Repackage the skill1834. Present the updated skill to user184 185---186 187## Reference File Standards188 189Each reference file should include:190 191### For Table Documentation192- **Location**: Full table path193- **Description**: What this table contains, when to use it194- **Primary Key**: How to uniquely identify rows195- **Update Frequency**: How often data refreshes196- **Key Columns**: Table with column name, type, description, notes197- **Relationships**: How this table joins to others198- **Sample Queries**: 2-3 common query patterns199 200### For Metrics Documentation201- **Metric Name**: Human-readable name202- **Definition**: Plain English explanation203- **Formula**: Exact calculation with column references204- **Source Table(s)**: Where the data comes from205- **Caveats**: Edge cases, exclusions, gotchas206 207### For Entity Documentation208- **Entity Name**: What it's called209- **Definition**: What it represents in the business210- **Primary Table**: Where to find this entity211- **ID Field(s)**: How to identify it212- **Relationships**: How it relates to other entities213- **Common Filters**: Standard exclusions (internal, test, etc.)214 215---216 217## Quality Checklist218 219Before delivering a generated skill, verify:220 221- [ ] SKILL.md has complete frontmatter (name, description)222- [ ] Entity disambiguation section is clear223- [ ] Key terminology is defined224- [ ] Standard filters/exclusions are documented225- [ ] At least 2-3 sample queries per domain226- [ ] SQL uses correct dialect syntax227- [ ] Reference files are linked from SKILL.md navigation section

Related skills

Accessibility Review

Install Accessibility Review skill for Claude Code from anthropics/knowledge-work-plugins.

Account Research

Install Account Research skill for Claude Code from anthropics/knowledge-work-plugins.

Algorithmic Art

When you want to create generative art that's actually algorithmic rather than just randomized shapes, this skill follows a two-step process that works surprisi