Skip to main content
Google’s Gemini provides powerful language models with built-in tools for search, code execution, RAG, and URL context. The LLM mode requires separate STT/TTS.
Vision Agents requires a Stream account for real-time transport. Most providers offer free tiers to get started.
Gemini also provides Realtime speech-to-speech with optional video over WebSocket.

Installation

uv add "vision-agents[gemini]"

Quick Start

from vision_agents.core import Agent, User
from vision_agents.plugins import gemini, getstream, deepgram, elevenlabs

agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="Assistant", id="agent"),
    instructions="You are a helpful assistant.",
    llm=gemini.LLM("gemini-3-flash-preview"),
    stt=deepgram.STT(),
    tts=elevenlabs.TTS(),
)

Built-in Tools

Gemini provides built-in tools you can enable:
llm = gemini.LLM(
    model="gemini-3-flash-preview",
    tools=[
        gemini.tools.GoogleSearch(),
        gemini.tools.CodeExecution(),
        gemini.tools.FileSearch(store),  # RAG
        gemini.tools.URLContext(),
    ]
)
ToolDescription
GoogleSearchGround responses with web data
CodeExecutionRun Python code
FileSearchRAG over your documents
URLContextRead specific web pages

File Search (RAG)

Managed RAG with automatic chunking and retrieval:
from vision_agents.plugins import gemini

store = gemini.GeminiFilesearchRAG(name="my-knowledge-base")
await store.create()
await store.add_directory("./knowledge")

llm = gemini.LLM(
    model="gemini-3-flash-preview",
    tools=[gemini.tools.FileSearch(store)]
)
See the RAG guide for more details.

Function Calling

@agent.llm.register_function(description="Get weather for a location")
async def get_weather(location: str) -> dict:
    return {"temperature": "22°C", "condition": "Sunny"}
See the Function Calling guide for details.

Events

The Gemini plugin emits events for connection state and responses. Most developers should use the core events (LLMResponseCompletedEvent, etc.) for provider-agnostic code.
from vision_agents.plugins.gemini.events import (
    GeminiConnectedEvent,
    GeminiErrorEvent,
)

@agent.events.subscribe
async def on_gemini_connected(event: GeminiConnectedEvent):
    print(f"Connected to Gemini model: {event.model}")

@agent.events.subscribe
async def on_gemini_error(event: GeminiErrorEvent):
    print(f"Gemini error: {event.error}")
EventDescription
GeminiConnectedEventRealtime connection established
GeminiErrorEventError occurred
GeminiAudioEventAudio output received
GeminiTextEventText output received
GeminiResponseEventResponse chunk received

Next Steps

Gemini Realtime

Speech-to-speech with optional video

Build a Voice Agent

Get started with voice