Skip to main content
Build a voice agent with Gemini Realtime on Stream’s edge network. You’ll need a free Stream account, a Google AI Studio API key, and 18 lines of Python.

Copy this prompt into Claude Code, Cursor, Windsurf, or any coding agent to scaffold your project.

CursorOpen in Cursor

1. Create a project

Requires uv and Python 3.12 with CPython.
mkdir my-agent && cd my-agent
uv init && uv add "vision-agents[getstream,gemini]" python-dotenv

2. Add your API keys

Create a .env file in your project root. Vision Agents auto-loads these for each plugin:
# Stream — getstream.io/try-for-free
STREAM_API_KEY=your_stream_api_key
STREAM_API_SECRET=your_stream_api_secret

# Google AI Studio — aistudio.google.com
GOOGLE_API_KEY=your_google_api_key

3. Write the agent

Create main.py:
from dotenv import load_dotenv

from vision_agents.core import Agent, AgentLauncher, User, Runner
from vision_agents.plugins import getstream, gemini

load_dotenv()


async def create_agent(**kwargs) -> Agent:
    return Agent(
        edge=getstream.Edge(),
        agent_user=User(name="Assistant", id="agent"),
        instructions="You're a helpful voice assistant. Be concise.",
        llm=gemini.Realtime(),
    )


async def join_call(agent: Agent, call_type: str, call_id: str, **kwargs) -> None:
    call = await agent.create_call(call_type, call_id)
    async with agent.join(call):
        await agent.simple_response("Greet the user")
        await agent.finish()


if __name__ == "__main__":
    Runner(AgentLauncher(create_agent=create_agent, join_call=join_call)).cli()

4. Run it

uv run main.py run
Swap to OpenAI Realtime in one line:
from vision_agents.plugins import openai

llm=openai.Realtime()

Next Steps

Voice Agents

Custom STT/LLM/TTS pipelines, function calling, provider options

Video Agents

VLMs, YOLO processors, real-time video analysis

Deploy to Production

Docker, Kubernetes, and monitoring

Browse Integrations

25+ AI providers to mix and match