Claude Agent Skill · by Aradotso

Karpathy Jobs Bls Visualizer

Install Karpathy Jobs Bls Visualizer skill for Claude Code from aradotso/trending-skills.

Install
Terminal · npx
$npx skills add https://github.com/anthropics/skills --skill frontend-design
Works with Paperclip

How Karpathy Jobs Bls Visualizer fits into a Paperclip company.

Karpathy Jobs Bls Visualizer drops into any Paperclip agent that handles this kind of work. Assign it to a specialist inside a pre-configured PaperclipOrg company and the skill becomes available on every heartbeat — no prompt engineering, no tool wiring.

S
SaaS FactoryPaired

Pre-configured AI company — 18 agents, 18 skills, one-time purchase.

$27$59
Explore pack
Source file
SKILL.md361 lines
Expand
---name: karpathy-jobs-bls-visualizerdescription: Research tool for visually exploring BLS Occupational Outlook Handbook data with an interactive treemap, LLM-powered scoring pipeline, and data scraping/parsing utilities.triggers:  - "explore BLS job market data"  - "visualize occupational outlook handbook"  - "add custom LLM scoring to jobs treemap"  - "scrape BLS occupation pages"  - "build AI exposure scores for occupations"  - "run the jobs visualization pipeline"  - "customize the treemap color layer"  - "fork karpathy jobs project"--- # karpathy/jobs — BLS Job Market Visualizer > Skill by [ara.so](https://ara.so) — Daily 2026 Skills collection. A research tool for visually exploring Bureau of Labor Statistics [Occupational Outlook Handbook](https://www.bls.gov/ooh/) data across 342 occupations. The interactive treemap colors rectangles by employment size (area) and any chosen metric (color): BLS growth outlook, median pay, education requirements, or LLM-scored AI exposure. The pipeline is fully forkable — write a new prompt, re-run scoring, get a new color layer. **Live demo:** [karpathy.ai/jobs](https://karpathy.ai/jobs/) --- ## Installation & Setup ```bash# Clone the repogit clone https://github.com/karpathy/jobscd jobs # Install dependencies (uses uv)uv syncuv run playwright install chromium``` Create a `.env` file with your OpenRouter API key (required only for LLM scoring): ```bashOPENROUTER_API_KEY=your_openrouter_key_here``` --- ## Full Pipeline — Key Commands Run these in order for a complete fresh build: ```bash# 1. Scrape BLS pages (non-headless Playwright; BLS blocks bots)#    Results cached in html/ — only needed onceuv run python scrape.py # 2. Convert raw HTML → clean Markdown in pages/uv run python process.py # 3. Extract structured fields → occupations.csvuv run python make_csv.py # 4. Score AI exposure via LLM (uses OpenRouter API, saves scores.json)uv run python score.py # 5. Merge CSV + scores → site/data.json for the frontenduv run python build_site_data.py # 6. Serve the visualization locallycd site && python -m http.server 8000# Open http://localhost:8000``` --- ## Key Files Reference | File | Description ||------|-------------|| `occupations.json` | Master list of 342 occupations (title, URL, category, slug) || `occupations.csv` | Summary stats: pay, education, job count, growth projections || `scores.json` | AI exposure scores (0–10) + rationales for all 342 occupations || `prompt.md` | All data in one ~45K-token file for pasting into an LLM || `html/` | Raw HTML pages from BLS (~40MB, source of truth) || `pages/` | Clean Markdown versions of each occupation page || `site/index.html` | The treemap visualization (single HTML file) || `site/data.json` | Compact merged data consumed by the frontend || `score.py` | LLM scoring pipeline — fork this to write custom prompts | --- ## Writing a Custom LLM Scoring Layer The most powerful feature: write any scoring prompt, run `score.py`, get a new treemap color layer. ### 1. Edit the prompt in `score.py` ```python# score.py (simplified structure)SYSTEM_PROMPT = """You are evaluating occupations for exposure to humanoid robotics over the next 10 years. Score each occupation from 0 to 10:- 0 = no meaningful exposure (e.g., requires fine social judgment, non-physical)- 5 = moderate exposure (some tasks automatable, but humans still central)- 10 = high exposure (repetitive physical tasks, predictable environments) Consider: physical task complexity, environment predictability, dexterity requirements,cost of robot vs human, regulatory barriers. Respond ONLY with JSON: {"score": <int 0-10>, "rationale": "<1-2 sentences>"}"""``` ### 2. Run the scoring pipeline ```python# The pipeline reads each occupation's Markdown from pages/,# sends it to the LLM, and writes results to scores.json # scores.json structure:{  "software-developers": {    "score": 1,    "rationale": "Software development is digital and cognitive; humanoid robots provide no advantage."  },  "construction-laborers": {    "score": 7,    "rationale": "Physical, repetitive outdoor tasks are targets for humanoid robotics, though unstructured environments remain challenging."  }  // ... 342 occupations total}``` ### 3. Rebuild site data ```bashuv run python build_site_data.pycd site && python -m http.server 8000``` --- ## Data Structures ### `occupations.json` entry ```json{  "title": "Software Developers",  "url": "https://www.bls.gov/ooh/computer-and-information-technology/software-developers.htm",  "category": "Computer and Information Technology",  "slug": "software-developers"}``` ### `occupations.csv` columns ```slug, title, category, median_pay, education, job_count, growth_percent, growth_outlook``` Example row:```software-developers, Software Developers, Computer and Information Technology,130160, Bachelor's degree, 1847900, 17, Much faster than average``` ### `site/data.json` entry (merged frontend data) ```json{  "slug": "software-developers",  "title": "Software Developers",  "category": "Computer and Information Technology",  "median_pay": 130160,  "education": "Bachelor's degree",  "job_count": 1847900,  "growth_percent": 17,  "growth_outlook": "Much faster than average",  "ai_score": 9,  "ai_rationale": "AI is deeply transforming software development workflows..."}``` --- ## Frontend Treemap (`site/index.html`) The visualization is a single self-contained HTML file using D3.js. ### Color layers (toggle in UI) | Layer | What it shows ||-------|---------------|| BLS Outlook | BLS projected growth category (green = fast growth) || Median Pay | Annual median wage (color gradient) || Education | Minimum education required || Digital AI Exposure | LLM-scored 0–10 AI impact estimate | ### Adding a new color layer to the frontend ```html<!-- In site/index.html, find the layer toggle buttons --><button onclick="setLayer('ai_score')">Digital AI Exposure</button> <!-- Add your new layer button --><button onclick="setLayer('robotics_score')">Humanoid Robotics</button>``` ```javascript// In the colorScale function, add a case for your new field:function getColor(d, layer) {  if (layer === 'robotics_score') {    // scores 0-10, blue = low exposure, red = high    return d3.interpolateRdYlBu(1 - d.robotics_score / 10);  }  // ... existing cases}``` Then update `build_site_data.py` to include your new score field in `data.json`. --- ## Generating the LLM-Ready Prompt File Package all 342 occupations + aggregate stats into a single file for LLM chat: ```bashuv run python make_prompt.py# Produces prompt.md (~45K tokens)# Paste into Claude, GPT-4, Gemini, etc. for data-grounded conversation``` --- ## Scraping Notes The BLS blocks automated bots, so `scrape.py` uses **non-headless** Playwright (real visible browser window): ```python# scrape.py key behaviorbrowser = await p.chromium.launch(headless=False)  # Must be visible# Pages saved to html/<slug>.html# Already-scraped pages are skipped (cached)``` If scraping fails or is rate-limited:- The `html/` directory already contains cached pages in the repo- You can skip scraping entirely and run from `process.py` onward- If re-scraping, add delays between requests to avoid blocks --- ## Common Patterns ### Re-score only missing occupations ```pythonimport json, os with open("scores.json") as f:    existing = json.load(f) with open("occupations.json") as f:    all_occupations = json.load(f) # Find gapsmissing = [o for o in all_occupations if o["slug"] not in existing]print(f"Missing scores: {len(missing)}")# Then run score.py with a filter for missing slugs``` ### Parse a single occupation page manually ```pythonfrom parse_detail import parse_occupation_pagefrom pathlib import Path html = Path("html/software-developers.html").read_text()data = parse_occupation_page(html)print(data["median_pay"])     # e.g. 130160print(data["job_count"])      # e.g. 1847900print(data["growth_outlook"]) # e.g. "Much faster than average"``` ### Load and query occupations.csv ```pythonimport pandas as pd df = pd.read_csv("occupations.csv") # Top 10 highest paying occupationstop_pay = df.nlargest(10, "median_pay")[["title", "median_pay", "growth_outlook"]]print(top_pay) # Filter: fast growth + high payhigh_value = df[    (df["growth_percent"] > 10) &    (df["median_pay"] > 80000)].sort_values("median_pay", ascending=False)``` ### Combine CSV with AI scores for analysis ```pythonimport pandas as pd, json df = pd.read_csv("occupations.csv") with open("scores.json") as f:    scores = json.load(f) df["ai_score"] = df["slug"].map(lambda s: scores.get(s, {}).get("score"))df["ai_rationale"] = df["slug"].map(lambda s: scores.get(s, {}).get("rationale")) # High AI exposure, high pay — reshaping, not disappearinghigh_exposure_high_pay = df[    (df["ai_score"] >= 8) &    (df["median_pay"] > 100000)][["title", "median_pay", "ai_score", "growth_outlook"]]print(high_exposure_high_pay)``` --- ## Troubleshooting **`playwright install` fails**```bashuv run playwright install --with-deps chromium``` **BLS scraping blocked / returns empty pages**- Ensure `headless=False` in `scrape.py` (already the default)- Add manual delays; do not run in CI- The cached `html/` directory in the repo can be used directly **`score.py` OpenRouter errors**- Verify `OPENROUTER_API_KEY` is set in `.env`- Check your OpenRouter account has credits- Default model is Gemini Flash — change `model` in `score.py` for a different LLM **`site/data.json` not updating after re-scoring**```bash# Always rebuild site data after changing scores.jsonuv run python build_site_data.py``` **Treemap shows blank / no data**- Confirm `site/data.json` exists and is valid JSON- Serve with `python -m http.server` (not `file://` — CORS blocks local JSON fetch)- Check browser console for fetch errors --- ## Important Caveats (from the project) - **AI Exposure ≠ job disappearance.** A score of 9/10 means AI is *transforming* the work, not eliminating demand. Software developers score 9/10 but demand is growing.- **Scores are rough LLM estimates** (Gemini Flash via OpenRouter), not rigorous economic predictions.- The tool does **not** account for demand elasticity, latent demand, regulatory barriers, or social preferences for human workers.- This is a **development/research tool**, not an economic publication.