Skip to main content

Documentation Index

Fetch the complete documentation index at: https://visionagents.ai/llms.txt

Use this file to discover all available pages before exploring further.

You’ll build a real-time voice agent you can talk to in your browser, using Gemini Realtime on Stream’s edge network. About 18 lines of Python.

Copy this prompt into Claude Code, Cursor, Windsurf, or any coding agent to scaffold your project.

CursorOpen in Cursor

Build your agent

Set up your project

Create a project directory and install Vision Agents with the getstream and gemini plugins. If you don’t have uv yet, install it first.
uv init --python 3.12 my-agent && cd my-agent
uv add "vision-agents[getstream,gemini]" python-dotenv

Add your API keys

Get the keys you’ll need:Then create a .env file in the project root. Vision Agents auto-loads these for each plugin.
.env
STREAM_API_KEY=your_stream_api_key
STREAM_API_SECRET=your_stream_api_secret
GOOGLE_API_KEY=your_google_api_key

Write the agent

Create main.py. The agent joins a Stream call and responds via Gemini Realtime.
main.py
from dotenv import load_dotenv

from vision_agents.core import Agent, AgentLauncher, User, Runner
from vision_agents.plugins import getstream, gemini

load_dotenv()


async def create_agent(**kwargs) -> Agent:
    return Agent(
        edge=getstream.Edge(),
        agent_user=User(name="Assistant", id="agent"),
        instructions="You're a helpful voice assistant. Be concise.",
        llm=gemini.Realtime(),
    )


async def join_call(agent: Agent, call_type: str, call_id: str, **kwargs) -> None:
    call = await agent.create_call(call_type, call_id)
    async with agent.join(call):
        await agent.simple_response("Greet the user")
        await agent.finish()


if __name__ == "__main__":
    Runner(AgentLauncher(create_agent=create_agent, join_call=join_call)).cli()

Run it

Start the agent. The CLI prints a join link, open it to talk to your agent in the browser.
uv run main.py run
The agent greets you as soon as you join the call.

Next steps

Voice Agents

Custom STT/LLM/TTS pipelines, function calling, provider options

Video Agents

VLMs, YOLO processors, real-time video analysis

Deploy to Production

Docker, Kubernetes, and monitoring

Browse Integrations

25+ AI providers to mix and match