Stream is the default edge transport for Vision Agents. TheDocumentation Index
Fetch the complete documentation index at: https://visionagents.ai/llms.txt
Use this file to discover all available pages before exploring further.
getstream plugin connects your agent to a Stream Video call over WebRTC and exposes Stream’s call platform — chat-backed conversation history, custom events, recording, transcription, broadcasting, and frontend SDKs for every major client — through the same EdgeTransport interface used by every other transport.
Vision Agents requires a Stream account for real-time transport. Stream offers 333,000 free participant minutes monthly, plus additional credits through the Maker Program for indie developers. Most AI providers also offer free tiers.
Why Stream Video RTC
- Sub-500ms global latency. Agents connect through Stream’s edge network with PoPs worldwide — the same infrastructure that powers Stream Video for production telehealth, voice support, and live coaching apps.
- The default in every example. All the LLM, STT, TTS, vision, and realtime guides in these docs use
getstream.Edge(). Swap providers freely; the edge stays the same. - Audio + video + screen share. The plugin subscribes to audio, video, screen-share, and screen-share-audio tracks for every remote participant and re-publishes the agent’s own audio and video.
- Chat-backed conversation history.
StreamConversationmirrors the message history to a Stream Chat channel attached to the call, with markdown-aware chunking and ephemeral updates while the LLM is still generating — so your frontend can render transcripts and tool output in real time. - Custom events to every participant. Push arbitrary JSON to clients via
send_custom_event(...)(payload capped at 5 KB by the platform) — useful for surfacing tool calls, UI hints, or telemetry. - Built-in demo helper.
open_demo(call)provisions a guest user, joins them to the chat channel, mints a token, and opens Stream’s hosted demo UI in your browser. Handy before you wire up a real client. - Rich event surface. The plugin re-exports Stream’s
call.*events — recording started/stopped, transcription ready, closed captions, HLS/RTMP broadcasting state, moderation actions, member updates, and more — so you can react to platform state from the agent’s event bus. - First-class frontend SDKs. Web, React, React Native, iOS, Android, Flutter, and Unity clients join the same call as your server-side agent.
- Generous free tier. See Stream’s pricing for details.
Installation
Quick Start
SetSTREAM_API_KEY and STREAM_API_SECRET from your Stream dashboard, then drop getstream.Edge() into your agent. The credentials are read by the underlying getstream Python client — Edge() takes no required arguments.
uv run main.py run. The CLI prints a join link you can open in any browser.
Environment Variables
Credentials and base URL are read by the underlyinggetstream client at construction time.
| Variable | Default | Description |
|---|---|---|
STREAM_API_KEY | — | API key from your Stream app dashboard. Required. |
STREAM_API_SECRET | — | API secret used to mint server-side tokens. Required. |
STREAM_BASE_URL | — | Override the Stream API base URL. Only set if instructed to by Stream support. |
Conversation Persistence
When an agent joins a call, the plugin creates amessaging channel with the same ID as the call and wires it up as the agent’s Conversation. Every message the agent produces (and every user message it observes) is mirrored to that channel.
The mirror does three things you don’t get with a plain in-memory conversation:
- Markdown-aware chunking — long messages are split into ~1000-character pieces, preserving code-block boundaries so frontend renderers don’t break mid-fence.
- Streaming updates — chunks are sent as ephemeral messages while the LLM is still generating, then finalized when generation completes. Clients see partial text update in real time.
- Bidirectional history — anything posted to the channel from a client SDK is available to the agent for memory or RAG.
Custom Events
Push arbitrary JSON to every participant watching the call. Clients subscribe withcall.on("custom", callback) in any frontend SDK. The payload is capped at 5 KB by the platform.
Opening a Demo
open_demo(call) creates a guest user, ensures it has access to the chat channel, mints a short-lived token, and opens Stream’s hosted demo UI pointed at your call. Useful while iterating locally before you wire up a real frontend.
EXAMPLE_BASE_URL environment variable.
Platform Events
The plugin registers everycall.* event from Stream’s API as well as SFU-level participant and track events on the agent’s event bus. Subscribe to any of them with agent.events.subscribe(...):
- Participants & members —
CallSessionParticipantJoined/Left,CallMemberAdded/Removed/Updated,CallSessionStarted/Ended. - Recording —
CallRecordingStarted/Stopped/Ready/Failed,CallFrameRecordingStarted/Stopped/FrameReady/Failed. - Transcription & captions —
CallTranscriptionStarted/Stopped/Ready/Failed,CallClosedCaptionsStarted/Stopped/Failed,ClosedCaptionEvent. - Broadcasting —
CallHLSBroadcastingStarted/Stopped/Failed,CallRtmpBroadcastStarted/Stopped/Failed. - Lifecycle —
CallCreated/Updated/Deleted/Ended,CallRing/Accepted/Rejected/Missed/Notification/Reaction. - Moderation & permissions —
CallModerationBlur/Warning,BlockedUser/UnblockedUser/KickedUser,PermissionRequest,UpdatedCallPermissions,CallUserMuted. - Telemetry —
CallStatsReportReady,CallUserFeedbackSubmitted.
Frontend SDKs
Your users connect with a Stream Video frontend SDK while your agent runs server-side with this plugin — both join the same call.| Platform | Docs |
|---|---|
| Web (vanilla JS / TypeScript) | Stream Video Web |
| React | Stream Video React |
| React Native | Stream Video React Native |
| iOS (Swift) | Stream Video iOS |
| Android (Kotlin) | Stream Video Android |
| Flutter | Stream Video Flutter |
| Unity | Stream Video Unity |
Next Steps
Build a Voice Agent
Pair the Stream edge with custom STT/LLM/TTS plugins.
Build a Video Agent
Add real-time video understanding with VLMs and YOLO.
Chat & Memory
Use the call’s Stream Chat channel for transcripts and tool surfaces.
Deploying Agents
Containerize and scale agents across Stream’s edge network.

