Install
Terminal · npx
$npx skills add https://github.com/google-gemini/gemini-skills --skill gemini-live-api-dev
Works with Paperclip
How Gemini Live Api Dev fits into a Paperclip company.

Gemini Live Api Dev drops into any Paperclip agent that handles this kind of work. Assign it to a specialist inside a pre-configured PaperclipOrg company and the skill becomes available on every heartbeat — no prompt engineering, no tool wiring.
SaaS FactoryPaired
Pre-configured AI company — 18 agents, 18 skills, one-time purchase.
$27$59
Explore pack
Source file
SKILL.md285 linesmarkdown
Expand
1---2name: gemini-live-api-dev3description: Use this skill when building real-time, bidirectional streaming applications with the Gemini Live API. Covers WebSocket-based audio/video/text streaming, voice activity detection (VAD), native audio features, function calling, session management, ephemeral tokens for client-side auth, and all Live API configuration options. SDKs covered - google-genai (Python), @google/genai (JavaScript/TypeScript).4---5 6# Gemini Live API Development Skill7 8## Overview9 10The Live API enables **low-latency, real-time voice and video interactions** with Gemini over WebSockets. It processes continuous streams of audio, video, or text to deliver immediate, human-like spoken responses.11 12Key capabilities:13- **Bidirectional audio streaming** — real-time mic-to-speaker conversations14- **Video streaming** — send camera/screen frames alongside audio15- **Text input/output** — send and receive text within a live session16- **Audio transcriptions** — get text transcripts of both input and output audio17- **Voice Activity Detection (VAD)** — automatic interruption handling18- **Native audio** — thinking (with configurable `thinkingLevel`)19- **Function calling** — synchronous tool use20- **Google Search grounding** — ground responses in real-time search results21- **Session management** — context compression, session resumption, GoAway signals22- **Ephemeral tokens** — secure client-side authentication23 24> [!NOTE]25> The Live API currently **only supports WebSockets**. For WebRTC support or simplified integration, use a [partner integration](#partner-integrations).26 27## Models28 29- `gemini-3.1-flash-live-preview` — Optimized for low-latency, real-time dialogue. Native audio output, thinking (via `thinkingLevel`). 128k context window. **This is the recommended model for all Live API use cases.**30 31> [!WARNING]32> The following Live API models are **deprecated** and will be shut down. Migrate to `gemini-3.1-flash-live-preview`.33> - `gemini-2.5-flash-native-audio-preview-12-2025` — Migrate to `gemini-3.1-flash-live-preview`.34> - `gemini-live-2.5-flash-preview` — Released June 17, 2025. Shutdown: December 9, 2025.35> - `gemini-2.0-flash-live-001` — Released April 9, 2025. Shutdown: December 9, 2025.36 37## SDKs38 39- **Python**: `google-genai` — `pip install google-genai`40- **JavaScript/TypeScript**: `@google/genai` — `npm install @google/genai`41 42> [!WARNING]43> Legacy SDKs `google-generativeai` (Python) and `@google/generative-ai` (JS) are deprecated. Use the new SDKs above.44 45## Partner Integrations46 47To streamline real-time audio/video app development, use a third-party integration supporting the Gemini Live API over **WebRTC** or **WebSockets**:48 49- [LiveKit](https://docs.livekit.io/agents/models/realtime/plugins/gemini/) — Use the Gemini Live API with LiveKit Agents.50- [Pipecat by Daily](https://docs.pipecat.ai/guides/features/gemini-live) — Create a real-time AI chatbot using Gemini Live and Pipecat.51- [Fishjam by Software Mansion](https://docs.fishjam.io/tutorials/gemini-live-integration) — Create live video and audio streaming applications with Fishjam.52- [Vision Agents by Stream](https://visionagents.ai/integrations/gemini) — Build real-time voice and video AI applications with Vision Agents.53- [Voximplant](https://voximplant.com/products/gemini-client) — Connect inbound and outbound calls to Live API with Voximplant.54- [Firebase AI SDK](https://firebase.google.com/docs/ai-logic/live-api?api=dev) — Get started with the Gemini Live API using Firebase AI Logic.55 56## Audio Formats57 58- **Input**: Raw PCM, little-endian, 16-bit, mono. 16kHz native (will resample others). MIME type: `audio/pcm;rate=16000`59- **Output**: Raw PCM, little-endian, 16-bit, mono. 24kHz sample rate.60 61> [!IMPORTANT]62> Use `send_realtime_input` / `sendRealtimeInput` for all real-time user input (audio, video, **and text**). `send_client_content` / `sendClientContent` is **only** supported for seeding initial context history (requires setting `initial_history_in_client_content` in `history_config`). Do **not** use it to send new user messages during the conversation.63 64> [!WARNING]65> Do **not** use `media` in `sendRealtimeInput`. Use the specific keys: `audio` for audio data, `video` for images/video frames, and `text` for text input.66 67---68 69## Quick Start70 71### Authentication72 73#### Python74 75```python76from google import genai77 78client = genai.Client(api_key="YOUR_API_KEY")79```80 81#### JavaScript82 83```js84import { GoogleGenAI } from '@google/genai';85 86const ai = new GoogleGenAI({ apiKey: 'YOUR_API_KEY' });87```88 89### Connecting to the Live API90 91#### Python92```python93from google.genai import types94 95config = types.LiveConnectConfig(96    response_modalities=[types.Modality.AUDIO],97    system_instruction=types.Content(98        parts=[types.Part(text="You are a helpful assistant.")]99    )100)101 102async with client.aio.live.connect(model="gemini-3.1-flash-live-preview", config=config) as session:103    pass  # Session is active104```105 106#### JavaScript107```js108const session = await ai.live.connect({109  model: 'gemini-3.1-flash-live-preview',110  config: {111    responseModalities: ['audio'],112    systemInstruction: { parts: [{ text: 'You are a helpful assistant.' }] }113  },114  callbacks: {115    onopen: () => console.log('Connected'),116    onmessage: (response) => console.log('Message:', response),117    onerror: (error) => console.error('Error:', error),118    onclose: () => console.log('Closed')119  }120});121```122 123### Sending Text124 125#### Python126```python127await session.send_realtime_input(text="Hello, how are you?")128```129 130#### JavaScript131```js132session.sendRealtimeInput({ text: 'Hello, how are you?' });133```134 135### Sending Audio136 137#### Python138```python139await session.send_realtime_input(140    audio=types.Blob(data=chunk, mime_type="audio/pcm;rate=16000")141)142```143 144#### JavaScript145```js146session.sendRealtimeInput({147  audio: { data: chunk.toString('base64'), mimeType: 'audio/pcm;rate=16000' }148});149```150 151### Sending Video152 153#### Python154```python155# frame: raw JPEG-encoded bytes156await session.send_realtime_input(157    video=types.Blob(data=frame, mime_type="image/jpeg")158)159```160 161#### JavaScript162```js163session.sendRealtimeInput({164  video: { data: frame.toString('base64'), mimeType: 'image/jpeg' }165});166```167 168### Receiving Audio and Text169 170> [!IMPORTANT]171> A single server event can contain **multiple content parts simultaneously** (e.g., audio chunks and transcript). Always process **all** parts in each event to avoid missing content.172 173#### Python174```python175async for response in session.receive():176    content = response.server_content177    if content:178        # Audio — process ALL parts in each event179        if content.model_turn:180            for part in content.model_turn.parts:181                if part.inline_data:182                    audio_data = part.inline_data.data183        # Transcription184        if content.input_transcription:185            print(f"User: {content.input_transcription.text}")186        if content.output_transcription:187            print(f"Gemini: {content.output_transcription.text}")188        # Interruption189        if content.interrupted is True:190            pass  # Stop playback, clear audio queue191```192 193#### JavaScript194```js195// Inside the onmessage callback196const content = response.serverContent;197if (content?.modelTurn?.parts) {198  for (const part of content.modelTurn.parts) {199    if (part.inlineData) {200      const audioData = part.inlineData.data; // Base64 encoded201    }202  }203}204if (content?.inputTranscription) console.log('User:', content.inputTranscription.text);205if (content?.outputTranscription) console.log('Gemini:', content.outputTranscription.text);206if (content?.interrupted) { /* Stop playback, clear audio queue */ }207```208 209---210 211## Limitations212 213- **Response modality** — Only `TEXT` **or** `AUDIO` per session, not both214- **Audio-only session** — 15 min without compression215- **Audio+video session** — 2 min without compression216- **Connection lifetime** — ~10 min (use session resumption)217- **Context window** — 128k tokens (native audio) / 32k tokens (standard)218- **Async function calling** — Not yet supported; function calling is synchronous only. The model will not start responding until you've sent the tool response.219- **Proactive audio** — Not yet supported in Gemini 3.1 Flash Live. Remove any configuration for this feature.220- **Affective dialogue** — Not yet supported in Gemini 3.1 Flash Live. Remove any configuration for this feature.221- **Code execution** — Not supported222- **URL context** — Not supported223 224## Migrating from Gemini 2.5 Flash Live225 226When migrating from `gemini-2.5-flash-native-audio-preview-12-2025` to `gemini-3.1-flash-live-preview`:227 2281. **Model string** — Update from `gemini-2.5-flash-native-audio-preview-12-2025` to `gemini-3.1-flash-live-preview`.2292. **Thinking configuration** — Use `thinkingLevel` (`minimal`, `low`, `medium`, `high`) instead of `thinkingBudget`. Default is `minimal` for lowest latency.2303. **Server events** — A single event can contain multiple content parts simultaneously (audio + transcript). Process **all** parts in each event.2314. **Client content** — `send_client_content` is only for seeding initial context history (set `initial_history_in_client_content` in `history_config`). Use `send_realtime_input` for text during conversation.2325. **Turn coverage** — Defaults to `TURN_INCLUDES_AUDIO_ACTIVITY_AND_ALL_VIDEO` instead of `TURN_INCLUDES_ONLY_ACTIVITY`. If sending constant video frames, consider sending only during audio activity to reduce costs.2336. **Async function calling** — Not yet supported. Function calling is synchronous only.2347. **Proactive audio & affective dialogue** — Not yet supported. Remove any configuration for these features.235 236## Best Practices237 2381. **Use headphones** when testing mic audio to prevent echo/self-interruption2392. **Enable context window compression** for sessions longer than 15 minutes2403. **Implement session resumption** to handle connection resets gracefully2414. **Use ephemeral tokens** for client-side deployments — never expose API keys in browsers2425. **Use `send_realtime_input`** for all real-time user input (audio, video, text). Reserve `send_client_content` only for seeding initial context history2436. **Send `audioStreamEnd`** when the mic is paused to flush cached audio2447. **Clear audio playback queues** on interruption signals2458. **Process all parts** in each server event — events can contain multiple content parts246 247## Documentation Lookup248 249### When MCP is Installed (Preferred)250 251If the **`search_documentation`** tool (from the Google MCP server) is available, use it as your **only** documentation source:252 2531. Call `search_documentation` with your query2542. Read the returned documentation2553. **Trust MCP results** as source of truth for API details — they are always up-to-date.256 257> [!IMPORTANT]258> When MCP tools are present, **never** fetch URLs manually. MCP provides up-to-date, indexed documentation that is more accurate and token-efficient than URL fetching.259 260### When MCP is NOT Installed (Fallback Only)261 262If no MCP documentation tools are available, fetch from the official docs index:263 264**llms.txt URL**: `https://ai.google.dev/gemini-api/docs/llms.txt`265 266This index contains links to all documentation pages in `.md.txt` format. Use web fetch tools to:267 2681. Fetch `llms.txt` to discover available documentation pages2692. Fetch specific pages (e.g., `https://ai.google.dev/gemini-api/docs/live-session.md.txt`)270 271### Key Documentation Pages 272 273> [!IMPORTANT]274> Those are not all the documentation pages. Use the `llms.txt` index to discover available documentation pages275 276- [Live API Overview](https://ai.google.dev/gemini-api/docs/live.md.txt) — getting started, raw WebSocket usage277- [Live API Capabilities Guide](https://ai.google.dev/gemini-api/docs/live-guide.md.txt) — voice config, transcription config, native audio (thinking), VAD configuration, media resolution278- [Live API Tool Use](https://ai.google.dev/gemini-api/docs/live-tools.md.txt) — function calling (sync and async), Google Search grounding279- [Session Management](https://ai.google.dev/gemini-api/docs/live-session.md.txt) — context window compression, session resumption, GoAway signals280- [Ephemeral Tokens](https://ai.google.dev/gemini-api/docs/ephemeral-tokens.md.txt) — secure client-side authentication for browser/mobile281- [WebSockets API Reference](https://ai.google.dev/api/live.md.txt) — raw WebSocket protocol details282 283## Supported Languages284 285The Live API supports 70 languages including: English, Spanish, French, German, Italian, Portuguese, Chinese, Japanese, Korean, Hindi, Arabic, Russian, and many more. Native audio models automatically detect and switch languages.
Related skills
Docs Changelog

Install Docs Changelog skill for Claude Code from google-gemini/gemini-cli.
Gemini Api Dev

The gemini-api-dev skill provides guidance for developers building applications with Google's Gemini models and APIs, covering SDK usage across Python, JavaScri
Gemini Interactions Api

Install Gemini Interactions Api skill for Claude Code from google-gemini/gemini-skills.