How Voicebox Voice Synthesis fits into a Paperclip company.

Voicebox Voice Synthesis drops into any Paperclip agent that handles this kind of work. Assign it to a specialist inside a pre-configured PaperclipOrg company and the skill becomes available on every heartbeat — no prompt engineering, no tool wiring.
SaaS FactoryPaired
Pre-configured AI company — 18 agents, 18 skills, one-time purchase.
$27$59
Explore pack
Source file
SKILL.md634 linesmarkdown
Expand
1---2name: voicebox-voice-synthesis3description: Expert skill for Voicebox — the open-source local voice cloning and TTS studio built with Tauri, React, and FastAPI4triggers:5  - "clone a voice with voicebox"6  - "generate speech locally with voicebox"7  - "set up voicebox voice synthesis"8  - "use voicebox API to synthesize speech"9  - "add TTS to my app with voicebox"10  - "configure voicebox TTS engine"11  - "apply voice effects in voicebox"12  - "voicebox stories editor multi-voice"13---14 15# Voicebox Voice Synthesis Studio16 17> Skill by [ara.so](https://ara.so) — Daily 2026 Skills collection.18 19Voicebox is a local-first, open-source voice cloning and TTS studio — a self-hosted alternative to ElevenLabs. It runs entirely on your machine (macOS MLX/Metal, Windows/Linux CUDA, CPU fallback), exposes a REST API on `localhost:17493`, and ships with 5 TTS engines, 23 languages, post-processing effects, and a multi-track Stories editor.20 21---22 23## Installation24 25### Pre-built Binaries (Recommended)26 27| Platform | Link |28|---|---|29| macOS Apple Silicon | https://voicebox.sh/download/mac-arm |30| macOS Intel | https://voicebox.sh/download/mac-intel |31| Windows | https://voicebox.sh/download/windows |32| Docker | `docker compose up` |33 34Linux requires building from source: https://voicebox.sh/linux-install35 36### Build from Source37 38**Prerequisites:** [Bun](https://bun.sh), [Rust](https://rustup.rs), [Python 3.11+](https://python.org), Tauri prerequisites39 40```bash41git clone https://github.com/jamiepine/voicebox.git42cd voicebox43 44# Install just task runner45brew install just        # macOS46cargo install just       # any platform47 48# Set up Python venv + all dependencies49just setup50 51# Start backend + desktop app in dev mode52just dev53```54 55```bash56# List all available commands57just --list58```59 60---61 62## Architecture63 64| Layer | Technology |65|---|---|66| Desktop App | Tauri (Rust) |67| Frontend | React + TypeScript + Tailwind CSS |68| State | Zustand + React Query |69| Backend | FastAPI (Python) on port 17493 |70| TTS Engines | Qwen3-TTS, LuxTTS, Chatterbox, Chatterbox Turbo, TADA |71| Effects | Pedalboard (Spotify) |72| Transcription | Whisper / Whisper Turbo |73| Inference | MLX (Apple Silicon) / PyTorch (CUDA/ROCm/XPU/CPU) |74| Database | SQLite |75 76The Python FastAPI backend handles all ML inference. The Tauri Rust shell wraps the frontend and manages the backend process lifecycle. The API is accessible directly at `http://localhost:17493` even when using the desktop app.77 78---79 80## REST API Reference81 82Base URL: `http://localhost:17493`  83Interactive docs: `http://localhost:17493/docs`84 85### Generate Speech86 87```bash88# Basic generation89curl -X POST http://localhost:17493/generate \90  -H "Content-Type: application/json" \91  -d '{92    "text": "Hello world, this is a voice clone.",93    "profile_id": "abc123",94    "language": "en"95  }'96 97# With engine selection98curl -X POST http://localhost:17493/generate \99  -H "Content-Type: application/json" \100  -d '{101    "text": "Speak slowly and with gravitas.",102    "profile_id": "abc123",103    "language": "en",104    "engine": "qwen3-tts"105  }'106 107# With paralinguistic tags (Chatterbox Turbo only)108curl -X POST http://localhost:17493/generate \109  -H "Content-Type: application/json" \110  -d '{111    "text": "That is absolutely hilarious! [laugh] I cannot believe it.",112    "profile_id": "abc123",113    "engine": "chatterbox-turbo",114    "language": "en"115  }'116```117 118### Voice Profiles119 120```bash121# List all profiles122curl http://localhost:17493/profiles123 124# Create a new profile125curl -X POST http://localhost:17493/profiles \126  -H "Content-Type: application/json" \127  -d '{128    "name": "Narrator",129    "language": "en",130    "description": "Deep narrative voice"131  }'132 133# Upload audio sample to a profile134curl -X POST http://localhost:17493/profiles/{profile_id}/samples \135  -F "file=@/path/to/voice-sample.wav"136 137# Export a profile138curl http://localhost:17493/profiles/{profile_id}/export \139  --output narrator-profile.zip140 141# Import a profile142curl -X POST http://localhost:17493/profiles/import \143  -F "file=@narrator-profile.zip"144```145 146### Generation Queue & Status147 148```bash149# Get generation status (SSE stream)150curl -N http://localhost:17493/generate/{generation_id}/status151 152# List recent generations153curl http://localhost:17493/generations154 155# Retry a failed generation156curl -X POST http://localhost:17493/generations/{generation_id}/retry157 158# Download generated audio159curl http://localhost:17493/generations/{generation_id}/audio \160  --output output.wav161```162 163### Models164 165```bash166# List available models and download status167curl http://localhost:17493/models168 169# Unload a model from GPU memory (without deleting)170curl -X POST http://localhost:17493/models/{model_id}/unload171```172 173---174 175## TypeScript/JavaScript Integration176 177### Basic TTS Client178 179```typescript180const VOICEBOX_URL = process.env.VOICEBOX_API_URL ?? "http://localhost:17493";181 182interface GenerateRequest {183  text: string;184  profile_id: string;185  language?: string;186  engine?: "qwen3-tts" | "luxtts" | "chatterbox" | "chatterbox-turbo" | "tada";187}188 189interface GenerateResponse {190  generation_id: string;191  status: "queued" | "processing" | "complete" | "failed";192  audio_url?: string;193}194 195async function generateSpeech(req: GenerateRequest): Promise<GenerateResponse> {196  const response = await fetch(`${VOICEBOX_URL}/generate`, {197    method: "POST",198    headers: { "Content-Type": "application/json" },199    body: JSON.stringify(req),200  });201 202  if (!response.ok) {203    throw new Error(`Voicebox API error: ${response.status} ${await response.text()}`);204  }205 206  return response.json();207}208 209// Usage210const result = await generateSpeech({211  text: "Welcome to our application.",212  profile_id: "abc123",213  language: "en",214  engine: "qwen3-tts",215});216 217console.log("Generation ID:", result.generation_id);218```219 220### Poll for Completion221 222```typescript223async function waitForGeneration(224  generationId: string,225  timeoutMs = 60_000226): Promise<string> {227  const start = Date.now();228 229  while (Date.now() - start < timeoutMs) {230    const res = await fetch(`${VOICEBOX_URL}/generations/${generationId}`);231    const data = await res.json();232 233    if (data.status === "complete") {234      return `${VOICEBOX_URL}/generations/${generationId}/audio`;235    }236    if (data.status === "failed") {237      throw new Error(`Generation failed: ${data.error}`);238    }239 240    await new Promise((r) => setTimeout(r, 1000));241  }242 243  throw new Error("Generation timed out");244}245```246 247### Stream Status with SSE248 249```typescript250function streamGenerationStatus(251  generationId: string,252  onStatus: (status: string) => void253): () => void {254  const eventSource = new EventSource(255    `${VOICEBOX_URL}/generate/${generationId}/status`256  );257 258  eventSource.onmessage = (event) => {259    const data = JSON.parse(event.data);260    onStatus(data.status);261 262    if (data.status === "complete" || data.status === "failed") {263      eventSource.close();264    }265  };266 267  eventSource.onerror = () => eventSource.close();268 269  // Return cleanup function270  return () => eventSource.close();271}272 273// Usage274const cleanup = streamGenerationStatus("gen_abc123", (status) => {275  console.log("Status update:", status);276});277```278 279### Download Audio as Blob280 281```typescript282async function downloadAudio(generationId: string): Promise<Blob> {283  const response = await fetch(284    `${VOICEBOX_URL}/generations/${generationId}/audio`285  );286 287  if (!response.ok) {288    throw new Error(`Failed to download audio: ${response.status}`);289  }290 291  return response.blob();292}293 294// Play in browser295async function playGeneratedAudio(generationId: string): Promise<void> {296  const blob = await downloadAudio(generationId);297  const url = URL.createObjectURL(blob);298  const audio = new Audio(url);299  audio.play();300  audio.onended = () => URL.revokeObjectURL(url);301}302```303 304---305 306## Python Integration307 308```python309import httpx310import asyncio311 312VOICEBOX_URL = "http://localhost:17493"313 314async def generate_speech(315    text: str,316    profile_id: str,317    language: str = "en",318    engine: str = "qwen3-tts"319) -> bytes:320    async with httpx.AsyncClient(timeout=120.0) as client:321        # Submit generation322        resp = await client.post(323            f"{VOICEBOX_URL}/generate",324            json={325                "text": text,326                "profile_id": profile_id,327                "language": language,328                "engine": engine,329            }330        )331        resp.raise_for_status()332        generation_id = resp.json()["generation_id"]333 334        # Poll until complete335        for _ in range(120):336            status_resp = await client.get(337                f"{VOICEBOX_URL}/generations/{generation_id}"338            )339            status_data = status_resp.json()340 341            if status_data["status"] == "complete":342                audio_resp = await client.get(343                    f"{VOICEBOX_URL}/generations/{generation_id}/audio"344                )345                return audio_resp.content346 347            if status_data["status"] == "failed":348                raise RuntimeError(f"Generation failed: {status_data.get('error')}")349 350            await asyncio.sleep(1.0)351 352        raise TimeoutError("Generation timed out after 120s")353 354 355# Usage356audio_bytes = asyncio.run(357    generate_speech(358        text="The quick brown fox jumps over the lazy dog.",359        profile_id="your-profile-id",360        language="en",361        engine="chatterbox",362    )363)364 365with open("output.wav", "wb") as f:366    f.write(audio_bytes)367```368 369---370 371## TTS Engine Selection Guide372 373| Engine | Best For | Languages | VRAM | Notes |374|---|---|---|---|---|375| `qwen3-tts` (0.6B/1.7B) | Quality + instructions | 10 | Medium | Supports delivery instructions in text |376| `luxtts` | Fast CPU generation | English only | ~1GB | 150x realtime on CPU, 48kHz |377| `chatterbox` | Multilingual coverage | 23 | Medium | Arabic, Hindi, Swahili, CJK + more |378| `chatterbox-turbo` | Expressive/emotion | English only | Low (350M) | Use `[laugh]`, `[sigh]`, `[gasp]` tags |379| `tada` (1B/3B) | Long-form coherence | 10 | High | 700s+ audio, HumeAI model |380 381### Delivery Instructions (Qwen3-TTS)382 383Embed natural language instructions directly in the text:384 385```typescript386await generateSpeech({387  text: "(whisper) I have a secret to tell you.",388  profile_id: "abc123",389  engine: "qwen3-tts",390});391 392await generateSpeech({393  text: "(speak slowly and clearly) Step one: open the application.",394  profile_id: "abc123",395  engine: "qwen3-tts",396});397```398 399### Paralinguistic Tags (Chatterbox Turbo)400 401```typescript402const tags = [403  "[laugh]", "[chuckle]", "[gasp]", "[cough]",404  "[sigh]", "[groan]", "[sniff]", "[shush]", "[clear throat]"405];406 407await generateSpeech({408  text: "Oh really? [gasp] I had no idea! [laugh] That's incredible.",409  profile_id: "abc123",410  engine: "chatterbox-turbo",411});412```413 414---415 416## Environment & Configuration417 418```bash419# Custom models directory (set before launching)420export VOICEBOX_MODELS_DIR=/path/to/models421 422# For AMD ROCm GPU (auto-configured, but can override)423export HSA_OVERRIDE_GFX_VERSION=11.0.0424```425 426Docker configuration (`docker-compose.yml` override):427 428```yaml429services:430  voicebox:431    environment:432      - VOICEBOX_MODELS_DIR=/models433    volumes:434      - /host/models:/models435    ports:436      - "17493:17493"437    # For NVIDIA GPU passthrough:438    deploy:439      resources:440        reservations:441          devices:442            - driver: nvidia443              count: 1444              capabilities: [gpu]445```446 447---448 449## Common Patterns450 451### Voice Profile Creation Flow452 453```typescript454// 1. Create profile455const profile = await fetch(`${VOICEBOX_URL}/profiles`, {456  method: "POST",457  headers: { "Content-Type": "application/json" },458  body: JSON.stringify({ name: "My Voice", language: "en" }),459}).then((r) => r.json());460 461// 2. Upload audio sample (WAV/MP3, ideally 5–30 seconds clean speech)462const formData = new FormData();463formData.append("file", audioBlob, "sample.wav");464 465await fetch(`${VOICEBOX_URL}/profiles/${profile.id}/samples`, {466  method: "POST",467  body: formData,468});469 470// 3. Generate with the new profile471const gen = await generateSpeech({472  text: "Testing my cloned voice.",473  profile_id: profile.id,474});475```476 477### Batch Generation with Queue478 479```typescript480async function batchGenerate(481  items: Array<{ text: string; profileId: string }>,482  engine = "qwen3-tts"483): Promise<string[]> {484  // Submit all — Voicebox queues them serially to avoid GPU contention485  const submissions = await Promise.all(486    items.map((item) =>487      generateSpeech({ text: item.text, profile_id: item.profileId, engine })488    )489  );490 491  // Wait for all completions492  const audioUrls = await Promise.all(493    submissions.map((s) => waitForGeneration(s.generation_id))494  );495 496  return audioUrls;497}498```499 500### Long-Form Text (Auto-Chunking)501 502Voicebox auto-chunks at sentence boundaries — just send the full text:503 504```typescript505const longScript = `506  Chapter one. The morning fog rolled across the valley floor...507  // Up to 50,000 characters supported508`;509 510await generateSpeech({511  text: longScript,512  profile_id: "narrator-profile-id",513  engine: "tada", // Best for long-form coherence514  language: "en",515});516```517 518---519 520## Troubleshooting521 522### API not responding523 524```bash525# Check if backend is running526curl http://localhost:17493/health527 528# Restart backend only (dev mode)529just backend530 531# Check logs532just logs533```534 535### GPU not detected536 537```bash538# Check detected backend539curl http://localhost:17493/system/info540 541# Force CPU mode (set before launch)542export VOICEBOX_FORCE_CPU=1543```544 545### Model download fails / slow546 547```bash548# Set custom models directory with more space549export VOICEBOX_MODELS_DIR=/path/with/space550just dev551 552# Cancel stuck download via API553curl -X DELETE http://localhost:17493/models/{model_id}/download554```555 556### Out of VRAM — unload models557 558```bash559# List loaded models560curl http://localhost:17493/models | jq '.[] | select(.loaded == true)'561 562# Unload specific model563curl -X POST http://localhost:17493/models/{model_id}/unload564```565 566### Audio quality issues567 568- Use 5–30 seconds of clean, noise-free speech for voice samples569- Multiple samples improve clone quality — upload 3–5 different sentences570- For multilingual cloning, use `chatterbox` engine571- Ensure sample audio is 16kHz+ mono WAV for best results572- Use `luxtts` for highest output quality (48kHz) in English573 574### Generation stuck in queue after crash575 576Voicebox auto-recovers stale generations on startup. If the issue persists:577 578```bash579curl -X POST http://localhost:17493/generations/{generation_id}/retry580```581 582---583 584## Frontend Integration (React Example)585 586```tsx587import { useState } from "react";588 589const VOICEBOX_URL = import.meta.env.VITE_VOICEBOX_URL ?? "http://localhost:17493";590 591export function VoiceGenerator({ profileId }: { profileId: string }) {592  const [text, setText] = useState("");593  const [audioUrl, setAudioUrl] = useState<string | null>(null);594  const [loading, setLoading] = useState(false);595 596  const handleGenerate = async () => {597    setLoading(true);598    try {599      const res = await fetch(`${VOICEBOX_URL}/generate`, {600        method: "POST",601        headers: { "Content-Type": "application/json" },602        body: JSON.stringify({ text, profile_id: profileId, language: "en" }),603      });604      const { generation_id } = await res.json();605 606      // Poll for completion607      let done = false;608      while (!done) {609        await new Promise((r) => setTimeout(r, 1000));610        const statusRes = await fetch(`${VOICEBOX_URL}/generations/${generation_id}`);611        const { status } = await statusRes.json();612        if (status === "complete") {613          setAudioUrl(`${VOICEBOX_URL}/generations/${generation_id}/audio`);614          done = true;615        } else if (status === "failed") {616          throw new Error("Generation failed");617        }618      }619    } finally {620      setLoading(false);621    }622  };623 624  return (625    <div>626      <textarea value={text} onChange={(e) => setText(e.target.value)} />627      <button onClick={handleGenerate} disabled={loading}>628        {loading ? "Generating..." : "Generate Speech"}629      </button>630      {audioUrl && <audio controls src={audioUrl} />}631    </div>632  );633}634```
Related skills
Agency Agents Ai Specialists

Install Agency Agents Ai Specialists skill for Claude Code from aradotso/trending-skills.
Agent Browser Automation

Install Agent Browser Automation skill for Claude Code from aradotso/trending-skills.
Antigravity Manager

Install Antigravity Manager skill for Claude Code from aradotso/trending-skills.