How Speak Tts fits into a Paperclip company.

Speak Tts drops into any Paperclip agent that handles this kind of work. Assign it to a specialist inside a pre-configured PaperclipOrg company and the skill becomes available on every heartbeat — no prompt engineering, no tool wiring.
SaaS FactoryPaired
Pre-configured AI company — 18 agents, 18 skills, one-time purchase.
$27$59
Explore pack
Source file
SKILL.md375 linesmarkdown
Expand
1---2name: speak-tts3description: Give your agent the ability to speak to you real-time. Talk to your Claude! Local TTS, text-to-speech, voice synthesis, audio generation with voice cloning on Apple Silicon. Use for reading articles aloud, audiobook narration, or voice responses. Runs entirely on-device via MLX - private, no API keys.4---5 6# speak - Talk to your Claude!7 8Give your agent the ability to speak to you real-time. Local text-to-speech, voice cloning, and audio generation on Apple Silicon.9Give your agent the ability to speak to you real-time. Local TTS with voice cloning on Apple Silicon.10 11## Prerequisites12 13| Requirement | Check | Install |14|-------------|-------|---------|15| Apple Silicon Mac | `uname -m` → arm64 | Intel not supported |16| macOS 12.0+ | `sw_vers` | - |17| sox | `which sox` | `brew install sox` |18| ffmpeg | `which ffmpeg` | `brew install ffmpeg` |19| poppler (PDF) | `which pdftotext` | `brew install poppler` |20 21## Input Sources22 23| Source | Example |24|--------|---------|25| Text file | `speak article.txt` |26| Markdown | `speak doc.md` |27| Direct string | `speak "Hello"` |28| Clipboard | `pbpaste \| speak` |29| Stdin | `cat file.txt \| speak` |30 31### Web Articles32```bash33lynx -dump -nolist "https://example.com/article" | speak --output article.wav34```35 36### Converting Formats37 38| Format | Convert Command |39|--------|-----------------|40| PDF | `pdftotext doc.pdf doc.txt` |41| DOCX | `textutil -convert txt doc.docx` |42| HTML | `pandoc -f html -t plain doc.html > doc.txt` |43 44## Output Modes45 46| Goal | Command |47|------|---------|48| Save for later | `speak text.txt --output file.wav` |49| Listen now (streaming) | `speak text.txt --stream` |50| Listen now (complete) | `speak text.txt --play` |51| Both | `speak text.txt --stream --output file.wav` |52 53### Default Behavior54```bash55speak article.txt          # → ~/Audio/speak/article.wav (no playback)56speak "Hello"              # → ~/Audio/speak/speak_<timestamp>.wav57```58 59## Directory Auto-Creation60 61| Directory | Auto-Created? |62|-----------|---------------|63| `~/Audio/speak/` | ✓ Yes |64| `~/.chatter/voices/` | ✗ No |65| Custom directories | ✗ No |66 67**Always create custom directories first:**68```bash69mkdir -p ~/.chatter/voices/70mkdir -p ~/Audio/custom/71```72 73## Voice Cloning74 75Voice cloning generates speech that matches your vocal characteristics (pitch, tone, cadence) from a short recording.76 77### Quality Expectations78- Output captures general voice characteristics but is **not a perfect replica**79- Quality depends heavily on sample quality80- 15-25 seconds is optimal (10s minimum, 30s maximum)81 82### Recording Your Voice83 84**Using QuickTime:**851. Open QuickTime Player → File → New Audio Recording862. Record 20 seconds of clear speech873. File → Export As → Audio Only (.m4a)884. Convert to WAV (see below)89 90**Using sox (command line):**91```bash92# -d = use default microphone93# Recording starts immediately and stops after 25 seconds94sox -d -r 24000 -c 1 ~/.chatter/voices/my_voice.wav trim 0 2595```96 97### Converting to Required Format98 99Voice samples **MUST** be: WAV, 24000 Hz, mono, 10-30 seconds.100 101```bash102# From MP3103ffmpeg -i voice.mp3 -ar 24000 -ac 1 voice.wav104 105# From M4A (QuickTime)106ffmpeg -i voice.m4a -ar 24000 -ac 1 voice.wav107 108# Trim to 25 seconds109ffmpeg -i long.wav -t 25 -ar 24000 -ac 1 trimmed.wav110 111# Check sample properties112ffprobe -i voice.wav 2>&1 | grep -E "Duration|Stream"113# Should show: Duration ~15-25s, 24000 Hz, mono114```115 116### Using Your Voice117 118```bash119# Create directory120mkdir -p ~/.chatter/voices/121 122# Move sample123mv voice.wav ~/.chatter/voices/my_voice.wav124 125# Test126speak "Testing my voice" --voice ~/.chatter/voices/my_voice.wav --stream127 128# Use for content129speak notes.txt --voice ~/.chatter/voices/my_voice.wav --output presentation.wav130```131 132**Path requirements:**133- ✓ Works: `~/.chatter/voices/my_voice.wav` (tilde expanded by shell)134- ✓ Works: `/Users/name/.chatter/voices/my_voice.wav`135- ✗ Fails: `my_voice.wav` (relative path)136- ✗ Fails: `./voices/my_voice.wav` (relative path)137 138### Voice Sample Tips139 140| Good Sample | Bad Sample |141|-------------|------------|142| Quiet room | Background noise |143| Natural pace | Rushed or monotone |144| Clear diction | Mumbling |145| Varied content | Repetitive phrases |146 147## Default Voice148 149When `--voice` is omitted, a built-in default voice is used:150```bash151speak "Hello world" --stream  # Uses default voice152```153 154## Emotion Tags155 156Tags produce **audible effects** (actual sounds), not spoken words:157 158```bash159speak "[sigh] Monday again." --stream160# Output: (sigh sound) "Monday again."161```162 163| Tag | Effect |164|-----|--------|165| `[laugh]` | Laughter |166| `[chuckle]` | Light chuckle |167| `[sigh]` | Sighing |168| `[gasp]` | Gasping |169| `[groan]` | Groaning |170| `[clear throat]` | Throat clearing |171| `[cough]` | Coughing |172| `[crying]` | Crying |173| `[singing]` | Sung speech |174 175**NOT supported:** `[pause]`, `[whisper]` (ignored)176 177**For pauses:** Use punctuation: `"Wait... let me think."`178 179## Batch Processing180 181```bash182mkdir -p ~/Audio/book/183speak ch01.txt ch02.txt ch03.txt --output-dir ~/Audio/book/184# Creates: ch01.wav, ch02.wav, ch03.wav185 186# With auto-chunking (for long files)187speak chapters/*.txt --output-dir ~/Audio/book/ --auto-chunk188 189# Skip completed files190speak chapters/*.txt --output-dir ~/Audio/book/ --skip-existing191```192 193### Auto-Chunk Behavior194 195When using `--auto-chunk` with batch processing:1961. Each input file is chunked **independently**1972. Chunks are generated and **automatically concatenated** per file1983. Final output: one `.wav` per input file (e.g., `ch01.wav`)1994. Intermediate chunks deleted (unless `--keep-chunks`)200 201**You don't need to manually concatenate chunks** — only concatenate final chapter files.202 203## Concatenating Audio204 205```bash206# Explicit order (recommended)207speak concat ch01.wav ch02.wav ch03.wav --output book.wav208 209# Glob pattern (REQUIRES zero-padded filenames)210speak concat audiobook/*.wav --output book.wav211```212 213### Zero-Padding Rules214 215**Critical for correct concatenation order:**216 217| Files | Correct | Wrong |218|-------|---------|-------|219| 1-9 | `01`, `02`, ..., `09` | `1`, `2`, ..., `9` |220| 10-99 | `01`, `02`, ..., `99` | `1`, `10`, `2`, ... |221| 100+ | `001`, `002`, ..., `999` | `1`, `100`, `2`, ... |222 223**Why:** Shell glob expansion sorts alphabetically. `1, 10, 2` vs `01, 02, 10`.224 225## PDF to Audiobook (Complete Workflow)226 227### Step 1: Find Chapter Boundaries228```bash229# Preview table of contents230pdftotext -f 1 -l 5 textbook.pdf toc.txt231cat toc.txt  # Note chapter page numbers232 233# Or search for "Chapter" markers234pdftotext textbook.pdf - | grep -n "Chapter"235```236 237### Step 2: Extract Chapters (Zero-Padded!)238```bash239# For 100-page book with ~10 chapters240pdftotext -f 1 -l 12 -layout textbook.pdf ch01.txt241pdftotext -f 13 -l 25 -layout textbook.pdf ch02.txt242pdftotext -f 26 -l 38 -layout textbook.pdf ch03.txt243# ... continue for all chapters244```245 246### Step 3: Estimate Time247```bash248speak --estimate ch*.txt249# Shows: total audio duration, generation time, storage needed250 251# Quick estimates:252# 1 page ≈ 2 min audio ≈ 1 min generation253# 100 pages ≈ 200 min audio ≈ 100 min generation ≈ 500 MB254```255 256### Step 4: Generate Audio257```bash258mkdir -p audiobook/259speak ch01.txt ch02.txt ch03.txt --output-dir audiobook/ --auto-chunk260# Creates: audiobook/ch01.wav, audiobook/ch02.wav, audiobook/ch03.wav261```262 263### Step 5: Concatenate264```bash265speak concat audiobook/ch01.wav audiobook/ch02.wav audiobook/ch03.wav --output complete_audiobook.wav266# Or with glob (only if zero-padded):267speak concat audiobook/ch*.wav --output complete_audiobook.wav268```269 270### PDF Troubleshooting271 272| Issue | Solution |273|-------|----------|274| Empty/garbled text | Scanned PDF — use OCR: `brew install tesseract` |275| Wrong encoding | Try: `pdftotext -enc UTF-8 doc.pdf` |276| Check word count | `pdftotext doc.pdf - \| wc -w` (should be >100) |277 278## Multi-Voice Content279 280```bash281mkdir -p podcast/scripts podcast/wav282 283echo "Welcome to the show." > podcast/scripts/01_host.txt284echo "Thanks for having me." > podcast/scripts/02_guest.txt285 286speak podcast/scripts/01_host.txt --voice ~/.chatter/voices/host.wav --output podcast/wav/01.wav287speak podcast/scripts/02_guest.txt --voice ~/.chatter/voices/guest.wav --output podcast/wav/02.wav288 289speak concat podcast/wav/01.wav podcast/wav/02.wav --output podcast.wav290```291 292## Options Reference293 294| Option | Description | Default |295|--------|-------------|---------|296| `--stream` | Stream as it generates | false |297| `--play` | Play after complete | false |298| `--output <path>` | Output file | ~/Audio/speak/ |299| `--output-dir <dir>` | Batch output directory | - |300| `--voice <path>` | Voice sample (full path) | default |301| `--timeout <sec>` | Timeout per file | 300 |302| `--auto-chunk` | Split long documents | false |303| `--chunk-size <n>` | Chars per chunk | 6000 |304| `--resume <file>` | Resume from manifest | - |305| `--keep-chunks` | Keep intermediate files | false |306| `--skip-existing` | Skip if output exists | false |307| `--estimate` | Show duration estimate | false |308| `--dry-run` | Preview only | false |309| `--quiet` | Suppress output | false |310 311## Commands312 313| Command | Description |314|---------|-------------|315| `speak setup` | Set up environment |316| `speak health` | Check system status |317| `speak models` | List TTS models |318| `speak concat` | Concatenate audio |319| `speak daemon kill` | Stop TTS server |320| `speak config` | Show configuration |321 322## Performance323 324| Metric | Value |325|--------|-------|326| Cold start | ~4-8s |327| Warm start | ~3-8s |328| Speed | 0.3-0.5x RTF (faster than real-time) |329| Storage | ~2.5 MB/min, ~150 MB/hour |330 331## Resume Capability332 333For interrupted long generations:334 335```bash336# Single file with auto-chunk — use --resume337speak long.txt --auto-chunk --output book.wav338# If interrupted, manifest saved at ~/Audio/speak/manifest.json339speak --resume ~/Audio/speak/manifest.json340 341# Batch processing — use --skip-existing342speak ch*.txt --output-dir audiobook/ --auto-chunk343# If interrupted, re-run same command:344speak ch*.txt --output-dir audiobook/ --auto-chunk --skip-existing345```346 347## Common Errors348 349| Error | Cause | Solution |350|-------|-------|----------|351| "Voice file not found" | Relative path | Use full path: `~/.chatter/voices/x.wav` |352| "Invalid WAV format" | Wrong specs | Convert: `ffmpeg -i in.wav -ar 24000 -ac 1 out.wav` |353| "Voice sample too short" | <10 seconds | Record 15-25 seconds |354| "Output directory doesn't exist" | Not created | `mkdir -p dirname/` |355| "sox not found" | Not installed | `brew install sox` |356| Scrambled concat order | Non-zero-padded | Use `01`, `02`, not `1`, `2` |357| Timeout | >5 min generation | Use `--auto-chunk` or `--timeout 600` |358| "Server not running" | Stale daemon | `speak daemon kill && speak health` |359 360## Setup361 362```bash363speak "test"     # Auto-setup on first run (downloads model ~500MB)364speak setup      # Or manual setup365speak health     # Verify everything works366```367 368## Server Management369 370Server auto-starts and shuts down after 1 hour idle.371 372```bash373speak health        # Check status374speak daemon kill   # Stop manually375```
Related skills
Speakturbo Tts

Install Speakturbo Tts skill for Claude Code from emzod/speak-turbo.
1password

Install 1password skill for Claude Code from steipete/clawdis.
3d Web Experience

Install 3d Web Experience skill for Claude Code from sickn33/antigravity-awesome-skills.