Skip to main content
You’ll build a real-time voice agent you can talk to in your browser, using Gemini Realtime on Stream’s edge network. About 18 lines of Python. New to Vision Agents? Read the Overview first.

Copy this prompt into Claude Code, Cursor, Windsurf, or any coding agent to scaffold your project.

Open in Cursor

Build your agent

Install uv first; it includes uvx. The init command runs uv sync to provision your virtual environment, so uv must be on your PATH. Python 3.10–3.13 is supported.

Scaffold your project

One command creates a ready-to-run agent project with dependencies installed:
uvx vision-agents init my-agent && cd my-agent
This generates agent.py, pyproject.toml, .env.example, tests/, and a Dockerfile, then runs uv sync.
Pass --no-install to skip uv sync if you only want the project files: uvx vision-agents init my-agent --no-install

Add your API keys

Get the keys you’ll need:Copy the scaffolded template and fill in your keys. Vision Agents auto-loads these for each plugin.
cp .env.example .env
.env
STREAM_API_KEY=your_stream_api_key
STREAM_API_SECRET=your_stream_api_secret
GOOGLE_API_KEY=your_google_api_key

Understand your agent

Open agent.py. The init command already created this file. Walk through each section to see how the agent works, then customize as needed.
agent.py
from dotenv import load_dotenv
from vision_agents.core import Agent, Runner, User
from vision_agents.core.agents import AgentLauncher
from vision_agents.plugins import getstream, gemini

load_dotenv()

INSTRUCTIONS = (
    "You're a helpful voice AI assistant. "
    "Keep responses short and conversational."
)

async def create_agent(**kwargs) -> Agent:
    return Agent(
        edge=getstream.Edge(),
        agent_user=User(name="My AI assistant", id="agent"),
        instructions=INSTRUCTIONS,
        llm=gemini.Realtime(),
    )

async def join_call(agent: Agent, call_type: str, call_id: str, **kwargs) -> None:
    call = await agent.create_call(call_type, call_id)
    async with agent.join(call):
        await agent.simple_response(text="Say hi and introduce yourself.")
        await agent.finish()

runner = Runner(AgentLauncher(create_agent=create_agent, join_call=join_call))

if __name__ == "__main__":
    runner.cli()
  • create_agent: builds the agent with Stream transport and Gemini Realtime.
  • join_call: creates a call, joins it, and triggers the first response.
  • runner: entry point for the CLI; pyproject.toml references it as agent:runner.

Run it

Start the agent. The CLI prints a join link. Open it to talk to your agent in the browser.
uv run agent.py run
The agent greets you as soon as you join the call. The join link is a browser demo for testing. To embed the agent in your own app, use Stream’s Video SDKs on your target platform; your client and the agent join the same call.
Core Reference in the sidebar covers Agent, LLM, and processor APIs in depth. Skip it until your first agent runs, then come back when you need API details.

Next steps

Voice Agents

Custom STT/LLM/TTS pipelines, function calling, provider options

Video Agents

VLMs, YOLO processors, real-time video analysis

Deploy to Production

Docker, Kubernetes, and monitoring

Browse Integrations

25+ AI providers to mix and match