How Baoyu Youtube Transcript fits into a Paperclip company.

Baoyu Youtube Transcript drops into any Paperclip agent that handles this kind of work. Assign it to a specialist inside a pre-configured PaperclipOrg company and the skill becomes available on every heartbeat — no prompt engineering, no tool wiring.

SaaS FactoryPaired

Pre-configured AI company — 18 agents, 18 skills, one-time purchase.

$27$59

Explore pack

Source file

SKILL.md186 linesmarkdown

Expand

1---2name: baoyu-youtube-transcript3description: Downloads YouTube video transcripts/subtitles and cover images by URL or video ID. Supports multiple languages, translation, chapters, and speaker identification. Caches raw data for fast re-formatting. Use when user asks to "get YouTube transcript", "download subtitles", "get captions", "YouTube字幕", "YouTube封面", "视频封面", "video thumbnail", "video cover image", or provides a YouTube URL and wants the transcript/subtitle text or cover image extracted.4version: 1.1.05metadata:6  openclaw:7    homepage: https://github.com/JimLiu/baoyu-skills#baoyu-youtube-transcript8    requires:9      anyBins:10        - bun11        - npx12---13 14# YouTube Transcript15 16Downloads transcripts (subtitles/captions) from YouTube videos. Works with both manually created and auto-generated transcripts. No API key or browser required — uses YouTube's InnerTube API directly and automatically falls back to `yt-dlp` when YouTube blocks the direct API path.17 18Fetches video metadata and cover image on first run, caches raw data for fast re-formatting.19 20## Script Directory21 22Scripts in `scripts/` subdirectory. `{baseDir}` = this SKILL.md's directory path. Resolve `${BUN_X}` runtime: if `bun` installed → `bun`; if `npx` available → `npx -y bun`; else suggest installing bun. Replace `{baseDir}` and `${BUN_X}` with actual values.23 24| Script | Purpose |25|--------|---------|26| `scripts/main.ts` | Transcript download CLI |27 28## Usage29 30```bash31# Default: markdown with timestamps (English)32${BUN_X} {baseDir}/scripts/main.ts <youtube-url-or-id>33 34# Specify languages (priority order)35${BUN_X} {baseDir}/scripts/main.ts <url> --languages zh,en,ja36 37# Without timestamps38${BUN_X} {baseDir}/scripts/main.ts <url> --no-timestamps39 40# With chapter segmentation41${BUN_X} {baseDir}/scripts/main.ts <url> --chapters42 43# With speaker identification (requires AI post-processing)44${BUN_X} {baseDir}/scripts/main.ts <url> --speakers45 46# SRT subtitle file47${BUN_X} {baseDir}/scripts/main.ts <url> --format srt48 49# Translate transcript50${BUN_X} {baseDir}/scripts/main.ts <url> --translate zh-Hans51 52# List available transcripts53${BUN_X} {baseDir}/scripts/main.ts <url> --list54 55# Force re-fetch (ignore cache)56${BUN_X} {baseDir}/scripts/main.ts <url> --refresh57```58 59## Options60 61| Option | Description | Default |62|--------|-------------|---------|63| `<url-or-id>` | YouTube URL or video ID (multiple allowed) | Required |64| `--languages <codes>` | Language codes, comma-separated, in priority order | `en` |65| `--format <fmt>` | Output format: `text`, `srt` | `text` |66| `--translate <code>` | Translate to specified language code | |67| `--list` | List available transcripts instead of fetching | |68| `--timestamps` | Include `[HH:MM:SS → HH:MM:SS]` timestamps per paragraph | on |69| `--no-timestamps` | Disable timestamps | |70| `--chapters` | Chapter segmentation from video description | |71| `--speakers` | Raw transcript with metadata for speaker identification | |72| `--exclude-generated` | Skip auto-generated transcripts | |73| `--exclude-manually-created` | Skip manually created transcripts | |74| `--refresh` | Force re-fetch, ignore cached data | |75| `-o, --output <path>` | Save to specific file path | auto-generated |76| `--output-dir <dir>` | Base output directory | `youtube-transcript` |77 78## Optional Environment Variables79 80| Variable | Description |81|----------|-------------|82| `YOUTUBE_TRANSCRIPT_COOKIES_FROM_BROWSER` | Passed to `yt-dlp --cookies-from-browser` during fallback, e.g. `chrome`, `safari`, `firefox`, or `chrome:Profile 1` |83 84## Input Formats85 86Accepts any of these as video input:87- Full URL: `https://www.youtube.com/watch?v=dQw4w9WgXcQ`88- Short URL: `https://youtu.be/dQw4w9WgXcQ`89- Embed URL: `https://www.youtube.com/embed/dQw4w9WgXcQ`90- Shorts URL: `https://www.youtube.com/shorts/dQw4w9WgXcQ`91- Video ID: `dQw4w9WgXcQ`92 93## Output Formats94 95| Format | Extension | Description |96|--------|-----------|-------------|97| `text` | `.md` | Markdown with frontmatter (incl. `description`), title heading, summary, optional TOC/cover/timestamps/chapters/speakers |98| `srt` | `.srt` | SubRip subtitle format for video players |99 100## Output Directory101 102```103youtube-transcript/104├── .index.json                          # Video ID → directory path mapping (for cache lookup)105└── {channel-slug}/{title-full-slug}/106    ├── meta.json                        # Video metadata (title, channel, description, duration, chapters, etc.)107    ├── transcript-raw.json              # Raw transcript snippets from YouTube API (cached)108    ├── transcript-sentences.json        # Sentence-segmented transcript (split by punctuation, merged across snippets)109    ├── imgs/110    │   └── cover.jpg                    # Video thumbnail111    ├── transcript.md                    # Markdown transcript (generated from sentences)112    └── transcript.srt                   # SRT subtitle (generated from raw snippets, if --format srt)113```114 115- `{channel-slug}`: Channel name in kebab-case116- `{title-full-slug}`: Full video title in kebab-case117 118The `--list` mode outputs to stdout only (no file saved).119 120## Caching121 122On first fetch, the script saves:123- `meta.json` — video metadata, chapters, cover image path, language info124- `transcript-raw.json` — raw transcript snippets from YouTube API (`{ text, start, duration }[]`)125- `transcript-sentences.json` — sentence-segmented transcript (`{ text, start: "HH:mm:ss", end: "HH:mm:ss" }[]`), split by sentence-ending punctuation (`.?!…。？！` etc.), timestamps proportionally allocated by character length, CJK-aware text merging126- `imgs/cover.jpg` — video thumbnail127 128Subsequent runs for the same video use cached data (no network calls). Use `--refresh` to force re-fetch. If a different language is requested, the cache is automatically refreshed.129 130When YouTube returns anti-bot / blocked responses on the direct InnerTube path, the script retries with alternate client identities and then falls back to `yt-dlp` if available. If fallback is needed but `yt-dlp` is unavailable, the agent should decide how to make `yt-dlp` available and continue rather than pushing the installation decision to the user.131 132SRT output (`--format srt`) is generated from `transcript-raw.json`. Text/markdown output uses `transcript-sentences.json` for natural sentence boundaries.133 134## Workflow135 136When user provides a YouTube URL and wants the transcript:137 1381. Run with `--list` first if the user hasn't specified a language, to show available options1392. **Always single-quote the URL** when running the script — zsh treats `?` as a glob wildcard, so an unquoted YouTube URL causes "no matches found": use `'https://www.youtube.com/watch?v=ID'`1403. Default: run with `--chapters --speakers` for the richest output (chapters + speaker identification)1413. The script auto-saves cached data + output file and prints the file path1424. For `--speakers` mode: after the script saves the raw file, follow the speaker identification workflow below to post-process with speaker labels143 144When user only wants a cover image or metadata, running the script with any option will also cache `meta.json` and `imgs/cover.jpg`.145 146When re-formatting the same video (e.g., first text then SRT), the cached data is reused — no re-fetch needed.147 148## Chapter & Speaker Workflow149 150### Chapters (`--chapters`)151 152The script parses chapter timestamps from the video description (e.g., `0:00 Introduction`), segments the transcript by chapter boundaries, groups snippets into readable paragraphs, and saves as `.md` with a Table of Contents. No further processing needed.153 154If no chapter timestamps exist in the description, the transcript is output as grouped paragraphs without chapter headings.155 156### Speaker Identification (`--speakers`)157 158Speaker identification requires AI processing. The script outputs a raw `.md` file containing:159- YAML frontmatter with video metadata (title, channel, date, cover, description, language)160- Video description (for speaker name extraction)161- Chapter list from description (if available)162- Raw transcript in SRT format (pre-computed start/end timestamps, token-efficient)163 164After the script saves the raw file, spawn a sub-agent (use a cheaper model like Sonnet for cost efficiency) to process speaker identification:165 1661. Read the saved `.md` file1672. Read the prompt template at `{baseDir}/prompts/speaker-transcript.md`1683. Process the raw transcript following the prompt:169   - Identify speakers using video metadata (title → guest, channel → host, description → names)170   - Detect speaker turns from conversation flow, question-answer patterns, and contextual cues171   - Segment into chapters (use description chapters if available, else create from topic shifts)172   - Format with `**Speaker Name:**` labels, paragraph grouping (2-4 sentences), and `[HH:MM:SS → HH:MM:SS]` timestamps1734. Overwrite the `.md` file with the processed transcript (keep the YAML frontmatter)174 175When `--speakers` is used, `--chapters` is implied — the processed output always includes chapter segmentation.176 177## Error Cases178 179| Error | Meaning |180|-------|---------|181| Transcripts disabled | Video has no captions at all |182| No transcript found | Requested language not available |183| Video unavailable | Video deleted, private, or region-locked |184| IP blocked | Too many requests, try again later |185| Age restricted | Video requires login for age verification |186| bot detected | The script retries alternate clients and then `yt-dlp`; if fallback tooling is missing, the agent should resolve that itself, otherwise if it still fails try `YOUTUBE_TRANSCRIPT_COOKIES_FROM_BROWSER=safari` (or your browser) |

Related skills

Baoyu Article Illustrator

Baoyu-article-illustrator analyzes article content and automatically identifies positions where visual aids would enhance understanding, then generates illustra

Baoyu Comic

baoyu-comic generates original educational comics from markdown content with customizable art styles (ligne-claire, manga, realistic, ink-brush, chalk, minimali

Baoyu Compress Image

Baoyu-compress-image compresses images to WebP or PNG format using the best available system tool (sips, cwebp, ImageMagick, or Sharp) selected based on what's