You’ll build a real-time voice agent you can talk to in your browser, using Gemini Realtime on Stream’s edge network. About 18 lines of Python.Documentation Index
Fetch the complete documentation index at: https://visionagents.ai/llms.txt
Use this file to discover all available pages before exploring further.
Copy this prompt into Claude Code, Cursor, Windsurf, or any coding agent to scaffold your project.
Build your agent
Set up your project
Create a project directory and install Vision Agents with the
getstream and gemini plugins. If you don’t have uv yet, install it first.Add your API keys
Get the keys you’ll need:
- Create a free Stream account for
STREAM_API_KEYandSTREAM_API_SECRET. - Get an API key from Google AI Studio for
GOOGLE_API_KEY.
.env file in the project root. Vision Agents auto-loads these for each plugin..env
Write the agent
Create
main.py. The agent joins a Stream call and responds via Gemini Realtime.main.py
Next steps
Voice Agents
Custom STT/LLM/TTS pipelines, function calling, provider options
Video Agents
VLMs, YOLO processors, real-time video analysis
Deploy to Production
Docker, Kubernetes, and monitoring
Browse Integrations
25+ AI providers to mix and match

