Gemini LLM

Google’s Gemini provides powerful language models with built-in tools for search, code execution, RAG, and URL context. The LLM mode requires separate STT/TTS.

Vision Agents requires a Stream account for real-time transport. Most providers offer free tiers to get started.

Gemini also provides Realtime speech-to-speech with optional video over WebSocket.

Installation

uv add "vision-agents[gemini]"

Quick Start

from vision_agents.core import Agent, User
from vision_agents.plugins import gemini, getstream, deepgram, elevenlabs

agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="Assistant", id="agent"),
    instructions="You are a helpful assistant.",
    llm=gemini.LLM("gemini-3-flash-preview"),
    stt=deepgram.STT(),
    tts=elevenlabs.TTS(),
)

Built-in Tools

Gemini provides built-in tools you can enable:

llm = gemini.LLM(
    model="gemini-3-flash-preview",
    tools=[
        gemini.tools.GoogleSearch(),
        gemini.tools.CodeExecution(),
        gemini.tools.FileSearch(store),  # RAG
        gemini.tools.URLContext(),
    ]
)

Tool	Description
`GoogleSearch`	Ground responses with web data
`CodeExecution`	Run Python code
`FileSearch`	RAG over your documents
`URLContext`	Read specific web pages

File Search (RAG)

Managed RAG with automatic chunking and retrieval:

from vision_agents.plugins import gemini

store = gemini.GeminiFilesearchRAG(name="my-knowledge-base")
await store.create()
await store.add_directory("./knowledge")

llm = gemini.LLM(
    model="gemini-3-flash-preview",
    tools=[gemini.tools.FileSearch(store)]
)

See the RAG guide for more details.

Function Calling

@agent.llm.register_function(description="Get weather for a location")
async def get_weather(location: str) -> dict:
    return {"temperature": "22°C", "condition": "Sunny"}

See the Function Calling guide for details.

Events

The Gemini plugin emits events for connection state and responses. Most developers should use the core events (LLMResponseCompletedEvent, etc.) for provider-agnostic code.

from vision_agents.plugins.gemini.events import (
    GeminiConnectedEvent,
    GeminiErrorEvent,
)

@agent.events.subscribe
async def on_gemini_connected(event: GeminiConnectedEvent):
    print(f"Connected to Gemini model: {event.model}")

@agent.events.subscribe
async def on_gemini_error(event: GeminiErrorEvent):
    print(f"Gemini error: {event.error}")

Event	Description
`GeminiConnectedEvent`	Realtime connection established
`GeminiErrorEvent`	Error occurred
`GeminiAudioEvent`	Audio output received
`GeminiTextEvent`	Text output received
`GeminiResponseEvent`	Response chunk received

Overview

Language Models

Realtime

Speech-to-Text

Text-to-Speech

Vision & Video

Avatars

Turn Detection

Infrastructure

Edge Transport

Custom Integrations

Installation

Quick Start

Built-in Tools

File Search (RAG)

Function Calling

Events

Next Steps

Gemini Realtime

Build a Voice Agent

Overview

Language Models

Realtime

Speech-to-Text

Text-to-Speech

Vision & Video

Avatars

Turn Detection

Infrastructure

Edge Transport

Custom Integrations

Documentation Index

​Installation

​Quick Start

​Built-in Tools

​File Search (RAG)

​Function Calling

​Events

​Next Steps

Gemini Realtime

Build a Voice Agent

Installation

Quick Start

Built-in Tools

File Search (RAG)

Function Calling

Events

Next Steps