Quickstart

You’ll build a real-time voice agent you can talk to in your browser, using Gemini Realtime on Stream’s edge network. About 18 lines of Python.

Copy this prompt into Claude Code, Cursor, Windsurf, or any coding agent to scaffold your project.

Open in Cursor

Build your agent

Set up your project

Create a project directory and install Vision Agents with the getstream and gemini plugins. If you don’t have uv yet, install it first.

uv init --python 3.12 my-agent && cd my-agent
uv add "vision-agents[getstream,gemini]" python-dotenv

Add your API keys

Get the keys you’ll need:

Create a free Stream account for STREAM_API_KEY and STREAM_API_SECRET.
Get an API key from Google AI Studio for GOOGLE_API_KEY.

Then create a .env file in the project root. Vision Agents auto-loads these for each plugin.

.env

STREAM_API_KEY=your_stream_api_key
STREAM_API_SECRET=your_stream_api_secret
GOOGLE_API_KEY=your_google_api_key

Write the agent

Create main.py. The agent joins a Stream call and responds via Gemini Realtime.

main.py

from dotenv import load_dotenv

from vision_agents.core import Agent, AgentLauncher, User, Runner
from vision_agents.plugins import getstream, gemini

load_dotenv()


async def create_agent(**kwargs) -> Agent:
    return Agent(
        edge=getstream.Edge(),
        agent_user=User(name="Assistant", id="agent"),
        instructions="You're a helpful voice assistant. Be concise.",
        llm=gemini.Realtime(),
    )


async def join_call(agent: Agent, call_type: str, call_id: str, **kwargs) -> None:
    call = await agent.create_call(call_type, call_id)
    async with agent.join(call):
        await agent.simple_response("Greet the user")
        await agent.finish()


if __name__ == "__main__":
    Runner(AgentLauncher(create_agent=create_agent, join_call=join_call)).cli()

Run it

Start the agent. The CLI prints a join link, open it to talk to your agent in the browser.

uv run main.py run

The agent greets you as soon as you join the call.

Next steps

Voice Agents

Custom STT/LLM/TTS pipelines, function calling, provider options

Video Agents

VLMs, YOLO processors, real-time video analysis

Deploy to Production

Docker, Kubernetes, and monitoring

Browse Integrations

25+ AI providers to mix and match

Getting Started

AI Technologies

Core Architecture

Reference

Build your agent

Next steps

Voice Agents

Video Agents

Deploy to Production

Browse Integrations

Getting Started

AI Technologies

Core Architecture

Reference

Documentation Index

​Build your agent

​Next steps

Voice Agents

Video Agents

Deploy to Production

Browse Integrations

Build your agent

Next steps