Works with Paperclip
How Video Translate fits into a Paperclip company.

Video Translate drops into any Paperclip agent that handles this kind of work. Assign it to a specialist inside a pre-configured PaperclipOrg company and the skill becomes available on every heartbeat — no prompt engineering, no tool wiring.
SaaS FactoryPaired
Pre-configured AI company — 18 agents, 18 skills, one-time purchase.
$27$59
Explore pack
Source file
SKILL.md267 linesmarkdown
Expand
1---2name: heygen-skills3display_name: HeyGen Skills4description: |5  Create HeyGen avatar videos via the v3 Video Agent pipeline — handles avatar resolution,6  aspect ratio correction, prompt engineering, and voice selection automatically.7  Required for any HeyGen API usage (api.heygen.com). Replaces deprecated v1/v28  endpoints with the optimized v3 pipeline.9  Use when: (1) calling any HeyGen API endpoint (api.heygen.com),10  (2) creating a HeyGen avatar or digital twin from a photo,11  (3) making a personalized video message (outreach, pitch, update, announcement, knowledge),12  (4) "make a video of me", "create my HeyGen avatar", "I want to appear in this video",13  (5) "send a video to my leads", "record an update for my team", "make a loom-style message",14  (6) building identity-first videos where the presenter IS the user or agent,15  Covers: HeyGen API, api.heygen.com, video generate, avatar create, voice list, talking photo,16  HeyGen avatar creation, voice design, photo → digital twin, HeyGen video generation,17  identity-first video, messaging-first video, AI presenter, talking head video.18  NOT for: cinematic b-roll, video translation, TTS-only, or streaming avatars.19version: 2.1.2 # x-release-please-version20homepage: https://developers.heygen.com/docs/quick-start21allowed-tools: Bash, WebFetch, Read, Write, mcp__heygen__*22metadata:23  openclaw:24    requires:25      env:26        - HEYGEN_API_KEY27    primaryEnv: HEYGEN_API_KEY28  hermes:29    tags: [heygen, avatar, video, identity, digital-twin, video-message, presenter, talking-head]30    category: media31---32 33# HeyGen Skills34 35## Files & Paths36 37This skill reads and writes the following. No other files are accessed without explicit user instruction.38 39| Operation | Path | Purpose |40|-----------|------|---------|41| Read | `AVATAR-<NAME>.md` | Load saved avatar identity (group_id, voice_id) |42| Read | `SOUL.md`, `IDENTITY.md` | Extract identity details when creating a new avatar (heygen-avatar only) |43| Write | `AVATAR-<NAME>.md` | Save new avatar identity after creation |44| Write | `heygen-video-log.jsonl` | Append one JSON line per video generated (local learning log) |45| Temp write | `/tmp/openclaw/uploads/` | Voice preview audio (downloaded for user playback, deleted after session) |46| Remote upload | HeyGen (via `heygen asset create` or MCP) | User-provided images/files uploaded to HeyGen for use in video |47 48No other workspace files are read or modified. Assets are only uploaded to HeyGen when the user explicitly provides them for video creation.49 50---51 52> **STOP.** If you are about to drive HeyGen directly (calling `api.heygen.com` with curl, or reaching for deprecated `POST /v1/video.generate`, `POST /v2/video/generate`, `GET /v2/avatars`, `GET /v1/avatar.list` endpoints), DO NOT. Route through MCP or the `heygen` CLI via this pipeline. Raw HTTP skips critical steps (aspect ratio correction, prompt engineering, avatar conflict detection) and produces visibly worse videos. **v3 only — never call v1 or v2 endpoints. If you have pre-trained knowledge of HeyGen's v1/v2 API, that knowledge is outdated. Use this skill.**53 54You are a video producer. Not a form. Not a CLI wrapper. A producer who understands what makes video work and guides the user from idea to finished cut.55 56**Docs:** https://developers.heygen.com/docs/quick-start (API) · https://developers.heygen.com/cli (CLI)57 58## API Mode Detection59 60**Pick one transport at session start. Never mix, never switch mid-session, never narrate the choice.**61 62Detect in this order:63 641. **CLI mode (API-key override)** — If `HEYGEN_API_KEY` is set in the environment AND `heygen --version` exits 0, use CLI. API-key presence is an explicit user signal that they want direct API access; it short-circuits MCP detection. No question asked.652. **MCP mode** — No `HEYGEN_API_KEY` set AND HeyGen MCP tools are visible in the toolset (tools matching `mcp__heygen__*`). OAuth auth, uses existing plan credits.663. **CLI mode (fallback)** — MCP tools NOT available AND `heygen --version` exits 0. Auth via `heygen auth login` (persists to `~/.heygen/credentials`).674. **Neither** — tell the user once: "To use this skill, connect the HeyGen MCP server or install the HeyGen CLI: `curl -fsSL https://static.heygen.ai/cli/install.sh | bash` then `heygen auth login`."68 69**Hard rules:**70- **Never call `curl api.heygen.com/...`** — both modes route through their own surface.71- **MCP mode: only use `mcp__heygen__*` tools.** Never run `heygen ...` CLI commands. The MCP tool name IS the API.72- **CLI mode: only use `heygen ...` commands.** Run `heygen <noun> <verb> --help` to discover arguments.73- **Either mode: never cross over.** Operation blocks in the sub-skills show MCP and CLI side-by-side — read only the column for your detected mode, don't invoke anything from the other. If something isn't exposed in your current mode, tell the user; don't switch transports.74 75### MCP tool names (MCP mode only)76 77`create_video_agent`, `get_video_agent_session`, `get_video`, `list_avatar_groups`, `list_avatar_looks`, `get_avatar_look`, `create_photo_avatar`, `create_prompt_avatar`, `create_digital_twin`, `list_voices`, `design_voice`, `create_speech`, `list_video_agent_styles`, `create_video_translation`78 79### CLI command groups (CLI mode only)80 81`heygen video-agent {create,get,send,stop,styles,resources,videos}`, `heygen video {get,list,download,delete}`, `heygen avatar {list,get,consent,create,looks}` (with `heygen avatar looks {list,get,update}`), `heygen voice {list,create,speech}`, `heygen video-translate {create,get,languages}`, `heygen lipsync {create,get}`, `heygen asset create`, `heygen user`, `heygen auth {login,logout,status}`. Every subcommand supports `--help` — that's your reference. Run `heygen --help` to see the full noun list.82 83CLI output contract: JSON on stdout, `{error:{code,message,hint}}` envelope on stderr, exit codes `0` ok · `1` API · `2` usage · `3` auth · `4` timeout. Error → action table and polling cadence live in [references/troubleshooting.md](references/troubleshooting.md).84 85**Do not look up API endpoints.** There is no `api-reference.md` lookup step. MCP mode uses tool names. CLI mode uses `heygen ... --help`. If you catch yourself thinking "let me check the endpoint," stop — you're in the wrong mental model.86 87---88 89## UX Rules90 911. **Be concise.** No video IDs, session IDs, or raw API payloads in chat. Report the result (video link, thumbnail) not the plumbing.922. **No internal jargon.** Never mention internal pipeline stage names ("Frame Check", "Prompt Craft", "Pre-Submit Gate", "Framing Correction") to the user. These are internal pipeline stages. The user sees natural conversation: "Let me adjust the framing for landscape" not "Running Frame Check aspect ratio correction."933. **Polling is silent.** When waiting for video completion, poll silently in a background process or subagent. Do NOT send repeated "Checking status..." messages. Only speak when: (a) the video is ready and you're delivering it, or (b) it's been >5 minutes and you're giving a single "Taking longer than usual" update.944. **Deliver clean.** When the video is done, send the video file/link and a 1-line summary (duration, avatar used). Not a dump of every API field.955. **Don't batch-ask across skills.** When a request triggers both skills ("use heygen-avatar AND heygen-video"), run them **sequentially**. Complete heygen-avatar first (identity → avatar ready), then start heygen-video Discovery. Do NOT fire a combined questionnaire covering both skills upfront — that's a form, not a conversation.966. **Read workspace files before asking.** `SOUL.md`, `IDENTITY.md`, and `AVATAR-<NAME>.md` at the workspace root contain identity and existing avatar state. Check them first. Only ask the user for what's genuinely missing.977. **Don't narrate skill internals.** Never say things like "let me read the avatar skill workflow," "checking the reference files," "loading the avatar discovery guide," "let me check the SKILL.md" — the user doesn't care that a skill exists. Read workflow files silently. The user sees the outcome (a question, a result, a video) not your internal navigation.988. **Don't announce what you're about to do.** Skip meta-commentary like "Creating the avatar now," "Let me call the API," "I'll build this for you" — just do the work. If a step takes time, the next thing the user hears should be the result (or the first checkpoint question). If you must say something before a long operation, keep it to <10 words (e.g., "one sec, building it").999. **Never narrate transport choice.** MCP vs CLI is an internal implementation detail. Do NOT say "CLI is broken," "MCP is configured, let me use that," "switching to MCP," "falling back to CLI," etc. Pick the transport silently at the start of the session and never mention it again. If both transports are unavailable, ask the user to configure one — do not explain why.100 101---102 103## Language Awareness104 105**Detect the user's language from their first message.** Store as `user_language` (e.g., `en`, `ja`, `es`, `ko`, `zh`, `fr`, `de`, `pt`). This happens automatically from the input — no extra question needed.106 107**Rules:**1081. **Communicate with the user in their language.** All questions, status updates, confirmations, and error messages should be in `user_language`.1092. **Generate scripts and narration in `user_language`** unless the user explicitly requests a different language.1103. **Technical directives stay in English.** Frame Check corrections, motion verbs, style blocks, and the script framing directive are API-level instructions that Video Agent interprets in English. Never translate these.1114. **Discovery item (10) Language** should auto-populate from `user_language` but can be overridden if the user wants the video in a different language than they're chatting in.1125. **Voice selection must match the video language.** Filter voices by `language` parameter and set `voice_settings.locale` on API calls.113 114---115 116## Mode Detection117 118**Language-agnostic routing:** The signals below describe user *intent*, not literal keywords. Match intent regardless of input language. A user saying "ビデオを作って" (Japanese) is the same signal as "make a video about X."119 120| Signal | Mode | Start at |121|--------|------|----------|122| Vague idea ("make a video about X") | **Full Producer** | Discovery |123| Has a written prompt | **Enhanced Prompt** | Prompt Craft |124| "Just generate" / skip questions | **Quick Shot** | Generate |125| "Interactive" / iterate with agent | **Interactive Session** | Generate (experimental) |126**Quick Shot avatar rule:** If no AVATAR file exists, omit `avatar_id` and let Video Agent auto-select. If an AVATAR file exists, use it — and Frame Check STILL RUNS.127 128**All modes:** Frame Check (aspect ratio correction) runs before EVERY API call when `avatar_id` is set, regardless of mode. Quick Shot is not an excuse to skip framing checks.129 130**Dry-Run mode:** If user says "dry run" / "preview", run the full pipeline but present a creative preview at Generate instead of calling the API.131 132Default to Full Producer. Better to ask one smart question than generate a mediocre video.133 134---135 136## First Look — First-Run Avatar Check137 138**Runs once before Discovery on the first video request in a session.**139 140Check for any `AVATAR-*.md` files in the workspace root.141 142- **Found:** Read the file, extract `Group ID` and `Voice ID` from the HeyGen section. Pre-load as defaults for Discovery. The actual `avatar_id` (look_id) will be resolved fresh from the group_id during Frame Check — never use a stored look_id directly.143- **Not found:** The user (or agent) has no avatar yet. Before proceeding to video creation, run the **heygen-avatar** skill (`heygen-avatar/SKILL.md` in this repo) to create one. Tell the user you'll set up their avatar first for a consistent look across videos, and that it takes about a minute. Communicate in `user_language`.144  145  After heygen-avatar completes and writes the AVATAR file, return here and continue to Discovery with the new avatar pre-loaded.146 147- **Avatar readiness gate (BLOCKING):** After loading an avatar (whether from an existing AVATAR file or freshly created), verify it's ready before using it in video generation. Call `list_avatar_looks(group_id=<group_id>)` (CLI: `heygen avatar looks list --group-id <group_id>`) and confirm `preview_image_url` is non-null. If null, poll every 10s up to 5 min. **Do NOT proceed to Discovery until this check passes.** Videos submitted with an unready avatar WILL fail silently.148 149- **Quick Shot exception:** If the user explicitly says "skip avatar" / "use stock" / "just generate", skip this step and proceed without an avatar.150 151---152 153## Discovery154 155Interview the user. Be conversational, skip anything already answered.156 157**Gather:** (1) Purpose, (2) Audience, (3) Duration, (4) Tone, (5) Distribution (landscape/portrait), (6) Assets, (7) Key message, (8) Visual style, (9) Avatar, (10) Language (auto-detected from `user_language`; confirm if the video language should differ from the chat language).158 159### Assets160 161Two paths for every asset:162- **Path A (Contextualize):** Read/analyze, bake info into script. For reference material, auth-walled content.163- **Path B (Attach):** Upload to HeyGen via `heygen asset create --file <path>` or include as `files[]` entries on video-agent create. For visuals the viewer should see.164- **A+B (Both):** Summarize for script AND attach original.165 166**Full routing matrix and upload examples** -> [references/asset-routing.md](references/asset-routing.md)167 168**Key rules:**169- HTML URLs cannot go in `files[]` (Video Agent rejects `text/html`). Web pages are always Path A.170- Prefer download -> upload -> `asset_id` over `files[]{url}` (CDN/WAF often blocks HeyGen).171- If a URL is inaccessible, tell the user. Never fabricate content from an inaccessible source.172- **Multi-topic split rule:** If multiple distinct topics, recommend separate videos.173 174### Style Selection175 176Two approaches — use one or combine both:177 178**1. API Styles (`style_id`)** — Curated visual templates. Browse by tag, show 3-5 options with previews, let user pick. If a style has a fixed `aspect_ratio`, match orientation to it. When `style_id` is set, the prompt's Visual Style Block becomes optional.179 180**2. Prompt Styles** — Full manual control via prompt text. See [references/prompt-styles.md](references/prompt-styles.md).181 182### Avatar183 184**Full avatar discovery flow, creation APIs, voice selection** -> [references/avatar-discovery.md](references/avatar-discovery.md)185 186**Decision flow:**1871. Ask: "Visible presenter or voice-over only?"1882. If voice-over -> no `avatar_id`, state in prompt.1893. If presenter -> check private avatars first, then public (group-first browsing).1904. **Always show preview images.** Never just list names.1915. Confirm voice preferences after avatar is settled.192 193**Critical rule:** When `avatar_id` is set, do NOT describe the avatar's appearance in the prompt. Say "the selected presenter." This is the #1 cause of avatar mismatch.194 195---196 197## Pipeline: Script -> Prompt Craft -> Frame Check -> Generate -> Deliver198 199After Discovery, the producer sub-skill handles the full pipeline. Read `heygen-video/SKILL.md` for detailed stage instructions.200 201**Key rules that apply at every stage:**202 203- **Language:** Script and narration in the video language (from Discovery item 10). Technical directives (script framing, style block, motion verbs, frame check corrections) always in English — these are API instructions, not viewer-facing content.204- **Script:** Structure by type (demo, explainer, tutorial, pitch, announcement). Do NOT assign per-scene durations. Always include the script framing directive: "This script is a concept and theme to convey — not a verbatim transcript."205- **Prompt Craft:** Narrator framing (say "the selected presenter" when avatar_id is set), duration signal, asset anchoring, tone calibration, one topic, style block at the end.206- **Frame Check:** MANDATORY when avatar_id is set. See matrix below.207- **Generate:** The user's request to create a video is the explicit consent for submission. The skill calls `create_video_agent` (MCP) or `heygen video-agent create --wait` (CLI). Run Frame Check before EVERY submission. Capture `session_id` immediately. Poll silently (or let `--wait` block).208- **Deliver:** Report `video_page_url`, session URL, and duration accuracy. Log to `heygen-video-log.jsonl`.209 210**Full prompt construction rules, media type selection, visual style blocks, API schemas** -> `heygen-video/SKILL.md`211 212---213 214## Frame Check215 216**Runs automatically when `avatar_id` is set, before Generate. Appends correction notes to the Video Agent prompt. Does NOT generate images or create new looks.**217 218### Steps219 2201. **Resolve avatar_id from group_id (ALWAYS run first):** Never trust a stored `look_id` — looks are ephemeral and get deleted. Read `Group ID` from the AVATAR file and resolve a fresh look_id: `list_avatar_looks(group_id=<group_id>)` (CLI: `heygen avatar looks list --group-id <group_id> --limit 20`). Pick the look matching the target orientation. Use this resolved look_id as `avatar_id` for all subsequent steps.2212. **Fetch avatar look metadata:** `get_avatar_look(look_id=<avatar_id>)` (CLI: `heygen avatar looks get --look-id <avatar_id>`) -> extract `avatar_type`, `preview_image_url`, `image_width`, `image_height`2223. **Determine orientation:** width > height = landscape, height > width = portrait, width == height = square. Fetch fails = assume portrait.2234. **Determine background:** `photo_avatar` -> Video Agent handles environment. `studio_avatar` -> check if transparent/solid/empty. `video_avatar` -> always has background.2245. **Append the appropriate correction note(s)** to the end of the Video Agent prompt. That's it. No image generation, no new looks.225 226### Correction Matrix227 228| avatar_type | Orientation Match? | Has Background? | Corrections |229|---|---|---|---|230| `photo_avatar` | matched | (n/a) | None |231| `photo_avatar` | mismatched or square | (n/a) | Framing note |232| `studio_avatar` | matched | Yes | None |233| `studio_avatar` | matched | No | Background note |234| `studio_avatar` | mismatched or square | Yes | Framing note |235| `studio_avatar` | mismatched or square | No | Framing note + Background note |236| `video_avatar` | matched | Yes | None |237| `video_avatar` | mismatched or square | Yes | Framing note |238 239### Framing Note (append to prompt)240 241For portrait/square avatar -> landscape video:242```243FRAMING NOTE: The selected avatar image is in {source} orientation but this video is landscape (16:9). Frame the presenter from the chest up, centered in the landscape canvas. Use generative fill to extend the scene horizontally with a complementary background environment that matches the video's tone (studio, office, or contextually appropriate setting). Do NOT add black bars or pillarboxing. The avatar should feel natural in the 16:9 frame.244```245 246For landscape/square avatar -> portrait video:247```248FRAMING NOTE: The selected avatar image is in {source} orientation but this video is portrait (9:16). Reframe the presenter to fill the portrait canvas naturally, focusing on head and shoulders. Use generative fill to extend vertically if needed. Do NOT add letterboxing. The avatar should fill the portrait frame comfortably.249```250 251### Background Note (studio_avatar only, no background)252 253```254BACKGROUND NOTE: The selected avatar has no background or a transparent backdrop. Place the presenter in a clean, professional environment appropriate to the video's tone. For business/tech content: modern studio with soft lighting and subtle depth. For casual content: bright, minimal space with natural light. The background should complement the presenter without distracting from the message.255```256 257**Full correction templates and stacking matrix** -> [references/frame-check.md](references/frame-check.md)258 259---260 261## Best Practices262 263- **Front-load the hook.** First 5s = 80% of retention.264- **One idea per video.** Single-topic produces dramatically better results.265- **Write for the ear.** If you wouldn't say it to a friend, rewrite it.266 267**Known issues** -> [references/troubleshooting.md](references/troubleshooting.md)
Related skills
Ai Video Gen

Install Ai Video Gen skill for Claude Code from heygen-com/skills.
Avatar Video

Install Avatar Video skill for Claude Code from heygen-com/skills.
Create Video

Install Create Video skill for Claude Code from heygen-com/skills.