How Karpathy Jobs Bls Visualizer fits into a Paperclip company.

Karpathy Jobs Bls Visualizer drops into any Paperclip agent that handles this kind of work. Assign it to a specialist inside a pre-configured PaperclipOrg company and the skill becomes available on every heartbeat — no prompt engineering, no tool wiring.
SaaS FactoryPaired
Pre-configured AI company — 18 agents, 18 skills, one-time purchase.
$27$59
Explore pack
Source file
SKILL.md361 linesmarkdown
Expand
1---2name: karpathy-jobs-bls-visualizer3description: Research tool for visually exploring BLS Occupational Outlook Handbook data with an interactive treemap, LLM-powered scoring pipeline, and data scraping/parsing utilities.4triggers:5  - "explore BLS job market data"6  - "visualize occupational outlook handbook"7  - "add custom LLM scoring to jobs treemap"8  - "scrape BLS occupation pages"9  - "build AI exposure scores for occupations"10  - "run the jobs visualization pipeline"11  - "customize the treemap color layer"12  - "fork karpathy jobs project"13---14 15# karpathy/jobs — BLS Job Market Visualizer16 17> Skill by [ara.so](https://ara.so) — Daily 2026 Skills collection.18 19A research tool for visually exploring Bureau of Labor Statistics [Occupational Outlook Handbook](https://www.bls.gov/ooh/) data across 342 occupations. The interactive treemap colors rectangles by employment size (area) and any chosen metric (color): BLS growth outlook, median pay, education requirements, or LLM-scored AI exposure. The pipeline is fully forkable — write a new prompt, re-run scoring, get a new color layer.20 21**Live demo:** [karpathy.ai/jobs](https://karpathy.ai/jobs/)22 23---24 25## Installation & Setup26 27```bash28# Clone the repo29git clone https://github.com/karpathy/jobs30cd jobs31 32# Install dependencies (uses uv)33uv sync34uv run playwright install chromium35```36 37Create a `.env` file with your OpenRouter API key (required only for LLM scoring):38 39```bash40OPENROUTER_API_KEY=your_openrouter_key_here41```42 43---44 45## Full Pipeline — Key Commands46 47Run these in order for a complete fresh build:48 49```bash50# 1. Scrape BLS pages (non-headless Playwright; BLS blocks bots)51#    Results cached in html/ — only needed once52uv run python scrape.py53 54# 2. Convert raw HTML → clean Markdown in pages/55uv run python process.py56 57# 3. Extract structured fields → occupations.csv58uv run python make_csv.py59 60# 4. Score AI exposure via LLM (uses OpenRouter API, saves scores.json)61uv run python score.py62 63# 5. Merge CSV + scores → site/data.json for the frontend64uv run python build_site_data.py65 66# 6. Serve the visualization locally67cd site && python -m http.server 800068# Open http://localhost:800069```70 71---72 73## Key Files Reference74 75| File | Description |76|------|-------------|77| `occupations.json` | Master list of 342 occupations (title, URL, category, slug) |78| `occupations.csv` | Summary stats: pay, education, job count, growth projections |79| `scores.json` | AI exposure scores (0–10) + rationales for all 342 occupations |80| `prompt.md` | All data in one ~45K-token file for pasting into an LLM |81| `html/` | Raw HTML pages from BLS (~40MB, source of truth) |82| `pages/` | Clean Markdown versions of each occupation page |83| `site/index.html` | The treemap visualization (single HTML file) |84| `site/data.json` | Compact merged data consumed by the frontend |85| `score.py` | LLM scoring pipeline — fork this to write custom prompts |86 87---88 89## Writing a Custom LLM Scoring Layer90 91The most powerful feature: write any scoring prompt, run `score.py`, get a new treemap color layer.92 93### 1. Edit the prompt in `score.py`94 95```python96# score.py (simplified structure)97SYSTEM_PROMPT = """98You are evaluating occupations for exposure to humanoid robotics over the next 10 years.99 100Score each occupation from 0 to 10:101- 0 = no meaningful exposure (e.g., requires fine social judgment, non-physical)102- 5 = moderate exposure (some tasks automatable, but humans still central)103- 10 = high exposure (repetitive physical tasks, predictable environments)104 105Consider: physical task complexity, environment predictability, dexterity requirements,106cost of robot vs human, regulatory barriers.107 108Respond ONLY with JSON: {"score": <int 0-10>, "rationale": "<1-2 sentences>"}109"""110```111 112### 2. Run the scoring pipeline113 114```python115# The pipeline reads each occupation's Markdown from pages/,116# sends it to the LLM, and writes results to scores.json117 118# scores.json structure:119{120  "software-developers": {121    "score": 1,122    "rationale": "Software development is digital and cognitive; humanoid robots provide no advantage."123  },124  "construction-laborers": {125    "score": 7,126    "rationale": "Physical, repetitive outdoor tasks are targets for humanoid robotics, though unstructured environments remain challenging."127  }128  // ... 342 occupations total129}130```131 132### 3. Rebuild site data133 134```bash135uv run python build_site_data.py136cd site && python -m http.server 8000137```138 139---140 141## Data Structures142 143### `occupations.json` entry144 145```json146{147  "title": "Software Developers",148  "url": "https://www.bls.gov/ooh/computer-and-information-technology/software-developers.htm",149  "category": "Computer and Information Technology",150  "slug": "software-developers"151}152```153 154### `occupations.csv` columns155 156```157slug, title, category, median_pay, education, job_count, growth_percent, growth_outlook158```159 160Example row:161```162software-developers, Software Developers, Computer and Information Technology,163130160, Bachelor's degree, 1847900, 17, Much faster than average164```165 166### `site/data.json` entry (merged frontend data)167 168```json169{170  "slug": "software-developers",171  "title": "Software Developers",172  "category": "Computer and Information Technology",173  "median_pay": 130160,174  "education": "Bachelor's degree",175  "job_count": 1847900,176  "growth_percent": 17,177  "growth_outlook": "Much faster than average",178  "ai_score": 9,179  "ai_rationale": "AI is deeply transforming software development workflows..."180}181```182 183---184 185## Frontend Treemap (`site/index.html`)186 187The visualization is a single self-contained HTML file using D3.js.188 189### Color layers (toggle in UI)190 191| Layer | What it shows |192|-------|---------------|193| BLS Outlook | BLS projected growth category (green = fast growth) |194| Median Pay | Annual median wage (color gradient) |195| Education | Minimum education required |196| Digital AI Exposure | LLM-scored 0–10 AI impact estimate |197 198### Adding a new color layer to the frontend199 200```html201<!-- In site/index.html, find the layer toggle buttons -->202<button onclick="setLayer('ai_score')">Digital AI Exposure</button>203 204<!-- Add your new layer button -->205<button onclick="setLayer('robotics_score')">Humanoid Robotics</button>206```207 208```javascript209// In the colorScale function, add a case for your new field:210function getColor(d, layer) {211  if (layer === 'robotics_score') {212    // scores 0-10, blue = low exposure, red = high213    return d3.interpolateRdYlBu(1 - d.robotics_score / 10);214  }215  // ... existing cases216}217```218 219Then update `build_site_data.py` to include your new score field in `data.json`.220 221---222 223## Generating the LLM-Ready Prompt File224 225Package all 342 occupations + aggregate stats into a single file for LLM chat:226 227```bash228uv run python make_prompt.py229# Produces prompt.md (~45K tokens)230# Paste into Claude, GPT-4, Gemini, etc. for data-grounded conversation231```232 233---234 235## Scraping Notes236 237The BLS blocks automated bots, so `scrape.py` uses **non-headless** Playwright (real visible browser window):238 239```python240# scrape.py key behavior241browser = await p.chromium.launch(headless=False)  # Must be visible242# Pages saved to html/<slug>.html243# Already-scraped pages are skipped (cached)244```245 246If scraping fails or is rate-limited:247- The `html/` directory already contains cached pages in the repo248- You can skip scraping entirely and run from `process.py` onward249- If re-scraping, add delays between requests to avoid blocks250 251---252 253## Common Patterns254 255### Re-score only missing occupations256 257```python258import json, os259 260with open("scores.json") as f:261    existing = json.load(f)262 263with open("occupations.json") as f:264    all_occupations = json.load(f)265 266# Find gaps267missing = [o for o in all_occupations if o["slug"] not in existing]268print(f"Missing scores: {len(missing)}")269# Then run score.py with a filter for missing slugs270```271 272### Parse a single occupation page manually273 274```python275from parse_detail import parse_occupation_page276from pathlib import Path277 278html = Path("html/software-developers.html").read_text()279data = parse_occupation_page(html)280print(data["median_pay"])     # e.g. 130160281print(data["job_count"])      # e.g. 1847900282print(data["growth_outlook"]) # e.g. "Much faster than average"283```284 285### Load and query occupations.csv286 287```python288import pandas as pd289 290df = pd.read_csv("occupations.csv")291 292# Top 10 highest paying occupations293top_pay = df.nlargest(10, "median_pay")[["title", "median_pay", "growth_outlook"]]294print(top_pay)295 296# Filter: fast growth + high pay297high_value = df[298    (df["growth_percent"] > 10) &299    (df["median_pay"] > 80000)300].sort_values("median_pay", ascending=False)301```302 303### Combine CSV with AI scores for analysis304 305```python306import pandas as pd, json307 308df = pd.read_csv("occupations.csv")309 310with open("scores.json") as f:311    scores = json.load(f)312 313df["ai_score"] = df["slug"].map(lambda s: scores.get(s, {}).get("score"))314df["ai_rationale"] = df["slug"].map(lambda s: scores.get(s, {}).get("rationale"))315 316# High AI exposure, high pay — reshaping, not disappearing317high_exposure_high_pay = df[318    (df["ai_score"] >= 8) &319    (df["median_pay"] > 100000)320][["title", "median_pay", "ai_score", "growth_outlook"]]321print(high_exposure_high_pay)322```323 324---325 326## Troubleshooting327 328**`playwright install` fails**329```bash330uv run playwright install --with-deps chromium331```332 333**BLS scraping blocked / returns empty pages**334- Ensure `headless=False` in `scrape.py` (already the default)335- Add manual delays; do not run in CI336- The cached `html/` directory in the repo can be used directly337 338**`score.py` OpenRouter errors**339- Verify `OPENROUTER_API_KEY` is set in `.env`340- Check your OpenRouter account has credits341- Default model is Gemini Flash — change `model` in `score.py` for a different LLM342 343**`site/data.json` not updating after re-scoring**344```bash345# Always rebuild site data after changing scores.json346uv run python build_site_data.py347```348 349**Treemap shows blank / no data**350- Confirm `site/data.json` exists and is valid JSON351- Serve with `python -m http.server` (not `file://` — CORS blocks local JSON fetch)352- Check browser console for fetch errors353 354---355 356## Important Caveats (from the project)357 358- **AI Exposure ≠ job disappearance.** A score of 9/10 means AI is *transforming* the work, not eliminating demand. Software developers score 9/10 but demand is growing.359- **Scores are rough LLM estimates** (Gemini Flash via OpenRouter), not rigorous economic predictions.360- The tool does **not** account for demand elasticity, latent demand, regulatory barriers, or social preferences for human workers.361- This is a **development/research tool**, not an economic publication.
Related skills
Agency Agents Ai Specialists

Install Agency Agents Ai Specialists skill for Claude Code from aradotso/trending-skills.
Agent Browser Automation

Install Agent Browser Automation skill for Claude Code from aradotso/trending-skills.
Antigravity Manager

Install Antigravity Manager skill for Claude Code from aradotso/trending-skills.