Gemini

Google’s Gemini Live plugin is a low-latency API that combines video analysis, transcription, text-to-speech synthesis, function calling and more into a single streamlined pipeline. The Gemini Live plugin in the Vision Agents SDK is a native integration for realtime video and audio with out-of-the-box support for Google’s Gemini Live models. With it, you can natively stream both audio and video frames to Gemini over websockets and receive responses in real-time. It also supports MCP and function calling, so agents are empowered to take actions for you. This is ideal for building conversational agents, AI avatars, fitness coaches, visual accessibility assistants, remote support tools with visual guidance, interactive tutors, and much more!

Installation

Install the Gemini Live plugin with

uv add vision-agents[gemini]

Tutorials

The Voice AI quickstart and Video AI quickstart pages have examples to get you up and running.

Example

Check out our Gemini Live example to see a practical implementation of the plugin and get inspiration for your own projects, or read on for some key details.

Initialization

The Gemini plugin for Stream exists in the form of the Realtime class:

from vision_agents.plugins import gemini

realtime = gemini.Realtime()

Parameters

These are the parameters available in the gemini.Realtime plugin:

Name	Type	Default	Description
`model`	`str`	`"gemini-3-pro-preview"`	The Gemini model to use. Supports Live-enabled models including Gemini 3 Pro and Gemini 2.5 Flash.
`config`	`LiveConnectConfigDict` or `None`	`None`	Configuration for the Gemini Live connection. If `None`, uses sensible defaults.
`api_key`	`str` or `None`	`None`	Your Gemini API key. If not provided, the SDK will look for it in env vars.
`fps`	`int`	`1`	Number of video frames per second to send to Gemini.
`client`	`genai.Client` or `None`	`None`	Optional pre-configured Gemini client. If provided, uses this instead of creating one.
`http_options`	`HttpOptions` or `None`	`None`	HTTP options for the Gemini client connection.

Functionality

Connect

The connect() method establishes a websocket connection to Gemini Live:

await realtime.connect()

Send Text Message

The simple_response() method allows you to send a text instruction to Gemini:

await realtime.simple_response("Describe what you see and say hi")

Send Audio

The simple_audio_response() method allows you to send audio data to Gemini:

await realtime.simple_audio_response(pcm_data)

Advanced: Send Realtime Input

For more control, you can use the native send_realtime_input() method which wraps Gemini’s API:

await realtime.send_realtime_input(text="Hello", media=blob)

Function Calling and MCP

Gemini Live supports function calling and MCP (Model Context Protocol) tools. When using the Realtime plugin via the main Agent class, you can register tools that Gemini can call. Follow the instructions in the MCP tool calling guide, using the Gemini Realtime class as your LLM. The plugin automatically handles:

Converting your tool definitions to Gemini’s format
Executing function calls when Gemini requests them
Sending function results back to Gemini

Configuration

The Gemini Live API uses LiveConnectConfigDict for configuration. You can customize various aspects of the connection:

from google.genai.types import LiveConnectConfigDict, Modality, SpeechConfigDict

config = LiveConnectConfigDict(
    response_modalities=[Modality.AUDIO],
    speech_config=SpeechConfigDict(
        language_code="en-US",
    ),
)

realtime = gemini.Realtime(config=config)

Events

The Gemini plugin emits standard Vision Agents events that you can listen to:

RealtimeConnectedEvent: Fired when connection is established
RealtimeDisconnectedEvent: Fired when connection is closed
RealtimeAudioOutputEvent: Fired when Gemini generates audio
LLMResponseChunkEvent: Fired when Gemini generates text
RealtimeTranscriptEvent: Fired for transcriptions

Access these events through the Agent’s event system. See the Event System guide for more details.

Overview

AI Providers

Custom Integrations

Installation

Tutorials

Example

Initialization

Parameters

Functionality

Connect

Send Text Message

Send Audio

Advanced: Send Realtime Input

Function Calling and MCP

Configuration

Events

Overview

AI Providers

Custom Integrations

​Installation

​Tutorials

​Example

​Initialization

​Parameters

​Functionality

​Connect

​Send Text Message

​Send Audio

​Advanced: Send Realtime Input

​Function Calling and MCP

​Configuration

​Events

Installation

Tutorials

Example

Initialization

Parameters

Functionality

Connect

Send Text Message

Send Audio

Advanced: Send Realtime Input

Function Calling and MCP

Configuration

Events