Name: Videoagent Audio Studio
Author: Pexoai

Install

Terminal · npx

$npx skills add https://github.com/pexoai/pexo-skills --skill videoagent-audio-studio

Works with Paperclip

How Videoagent Audio Studio fits into a Paperclip company.

Videoagent Audio Studio drops into any Paperclip agent that handles - video work. Assign it to a specialist inside a pre-configured PaperclipOrg company and the skill becomes available on every heartbeat — no prompt engineering, no tool wiring.

SaaS FactoryPaired

Pre-configured AI company — 18 agents, 18 skills, one-time purchase.

$27$59

Explore pack

Source file

SKILL.md259 linesmarkdown

Expand

1---2name: videoagent-audio-studio3version: 3.0.04author: "wells"5emoji: "🎙️"6tags:7  - video8  - audio9  - tts10  - music11  - sfx12  - voice-clone13  - elevenlabs14  - fal15description: >16  Tired of juggling multiple audio APIs? This skill gives you one-command access to TTS, music generation, sound effects, and voice cloning. Use when you want to generate any audio without managing multiple API keys.17homepage: https://github.com/pexoai/audiomind-skill18metadata:19  openclaw:20    emoji: "🎙️"21    primaryEnv: ELEVENLABS_API_KEY22    requires:23      env:24        - ELEVENLABS_API_KEY25    install:26      - id: elevenlabs-mcp27        kind: npm28        package: "@elevenlabs/mcp"29        label: "Install ElevenLabs MCP server"30---31 32# 🎙️ VideoAgent Audio Studio33 34**Use when:** User asks to generate speech, narrate text, create a voice-over, compose music, or produce a sound effect.35 36VideoAgent Audio Studio is a smart audio dispatcher. It analyzes your request and routes it to the best available model — ElevenLabs for speech and music, fal.ai for fast SFX — and returns a ready-to-use audio URL.37 38---39 40## Quick Reference41 42| Request Type | Best Model | Latency |43|---|---|---|44| Narrate text / Voice-over | `elevenlabs-tts-v3` | ~3s |45| Low-latency TTS (real-time) | `elevenlabs-tts-turbo` | <1s |46| Background music | `cassetteai-music` | ~15s |47| Sound effect | `elevenlabs-sfx` | ~5s |48| Clone a voice from audio | `elevenlabs-voice-clone` | ~10s |49 50---51 52## How to Use53 54### 1. Start the AudioMind server (once per session)55 56```bash57bash {baseDir}/tools/start_server.sh58```59 60This starts the ElevenLabs MCP server on port 8124. The skill uses it for all audio generation.61 62### 2. Route the request63 64Analyze the user's request and call the appropriate tool via the MCP server:65 66**Text-to-Speech (TTS)**67 68When user asks to "narrate", "read aloud", "say", or "create a voice-over":69 70```71Use MCP tool: text_to_speech72  text: "<the text to narrate>"73  voice_id: "JBFqnCBsd6RMkjVDRZzb"   # Default: "George" (professional, neutral)74  model_id: "eleven_multilingual_v2"   # Use "eleven_turbo_v2_5" for low latency75```76 77**Music Generation**78 79When user asks to "compose", "create background music", or "make a soundtrack":80 81```82Use MCP tool: text_to_sound_effects  (via cassetteai-music on fal.ai)83  prompt: "<music description, e.g. 'upbeat lo-fi hip hop, 90 seconds'>"84  duration_seconds: <duration>85```86 87**Sound Effect (SFX)**88 89When user asks for a specific sound (e.g., "a door creaking", "rain on a window"):90 91```92Use MCP tool: text_to_sound_effects93  text: "<sound description>"94  duration_seconds: <1-22>95```96 97**Voice Cloning**98 99When user provides an audio sample and wants to clone the voice:100 101```102Use MCP tool: voice_add103  name: "<voice name>"104  files: ["<audio_file_url>"]105```106 107---108 109## Example Conversations110 111**User:** "Voice this text for me: Welcome to our product launch"112 113```114→ Route to: text_to_speech115  text: "Welcome to our product launch"116  voice_id: "JBFqnCBsd6RMkjVDRZzb"117  model_id: "eleven_multilingual_v2"118```119 120> 🎙️ Voiceover done! [Listen here](audio_url)121 122---123 124**User:** "Generate 60 seconds of relaxing background music for a podcast"125 126```127→ Route to: cassetteai-music (fal.ai)128  prompt: "relaxing lo-fi background music for a podcast, gentle piano and soft beats, 60 seconds"129  duration_seconds: 60130```131 132> 🎵 Background music ready! [Listen here](audio_url)133 134---135 136**User:** "Generate a sci-fi style door opening sound effect"137 138```139→ Route to: text_to_sound_effects140  text: "a futuristic sci-fi door sliding open with a hydraulic hiss"141  duration_seconds: 3142```143 144---145 146## Setup147 148### Required149 150Set `ELEVENLABS_API_KEY` in `~/.openclaw/openclaw.json`:151 152```json153{154  "skills": {155    "entries": {156      "videoagent-audio-studio": {157        "enabled": true,158        "env": {159          "ELEVENLABS_API_KEY": "your_elevenlabs_key_here"160        }161      }162    }163  }164}165```166 167Get your key at [elevenlabs.io/app/settings/api-keys](https://elevenlabs.io/app/settings/api-keys).168 169### Optional (for fal.ai music & SFX models)170 171```json172"FAL_KEY": "your_fal_key_here"173```174 175Get your key at [fal.ai/dashboard/keys](https://fal.ai/dashboard/keys).176 177---178 179## Self-Hosting the Proxy180 181The `cli.js` connects to a hosted proxy by default. If you want full control — or need to serve users in regions where `vercel.app` is blocked — you can deploy your own instance from the `proxy/` directory.182 183### Quick Deploy (Vercel)184 185```bash186cd proxy187npm install188vercel --prod189```190 191### Environment Variables192 193Set these in your Vercel project (Dashboard → Settings → Environment Variables):194 195| Variable | Required For | Where to Get |196|---|---|---|197| `ELEVENLABS_API_KEY` | TTS, SFX, Voice Clone | [elevenlabs.io/app/settings/api-keys](https://elevenlabs.io/app/settings/api-keys) |198| `FAL_KEY` | Music generation | [fal.ai/dashboard/keys](https://fal.ai/dashboard/keys) |199| `VALID_PRO_KEYS` | (Optional) Restrict access | Comma-separated list of allowed client keys |200 201### Point cli.js to Your Proxy202 203```bash204export AUDIOMIND_PROXY_URL="https://your-domain.com/api/audio"205```206 207Or set it in `~/.openclaw/openclaw.json`:208 209```json210{211  "skills": {212    "entries": {213      "videoagent-audio-studio": {214        "env": {215          "AUDIOMIND_PROXY_URL": "https://your-domain.com/api/audio"216        }217      }218    }219  }220}221```222 223### Custom Domain (Recommended)224 225If your users are in mainland China, bind a custom domain in Vercel Dashboard → Settings → Domains to avoid DNS issues with `vercel.app`.226 227---228 229## Model Reference230 231| Model ID | Type | Provider | Notes |232|---|---|---|---|233| `eleven_multilingual_v2` | TTS | ElevenLabs | Best quality, supports 29 languages |234| `eleven_turbo_v2_5` | TTS | ElevenLabs | Ultra-low latency, ideal for real-time |235| `eleven_monolingual_v1` | TTS | ElevenLabs | English only, fastest |236| `cassetteai-music` | Music | fal.ai | Reliable, fast music generation |237| `elevenlabs-sfx` | SFX | ElevenLabs | High-quality sound effects (up to 22s) |238| `elevenlabs-voice-clone` | Clone | ElevenLabs | Clone any voice from a short audio sample |239 240---241 242## Changelog243 244### v3.0.0245- **Simplified routing table**: Removed unstable/offline models from the main reference. The skill now only surfaces models that reliably work.246- **Clearer use-case triggers**: Added "Use when" section so the agent activates this skill at the right moment.247- **Unified setup**: Single `ELEVENLABS_API_KEY` is all you need to get started. `FAL_KEY` is now optional.248- **Removed polling complexity**: Music generation now uses `cassetteai-music` by default, which completes synchronously.249 250### v2.1.0251- Added async workflow for long-running music generation tasks.252- Added `cassetteai-music` as a stable alternative for music generation.253 254### v2.0.0255- Migrated to ElevenLabs MCP server architecture.256- Added voice cloning support.257 258### v1.0.0259- Initial release with TTS, music, and SFX routing.

Related skills

Pexo Agent

A solid integration for generating short videos through Pexo's AI platform without leaving your Claude workflow. Handles the full pipeline from uploading assets

Seedance 2.0 Prompter

Install Seedance 2.0 Prompter skill for Claude Code from pexoai/pexo-skills.

Videoagent Image Studio

This is exactly what you need if you're tired of managing API keys for different image generation services. It gives Claude direct access to Midjourney, Flux, I