Stream Video RTC

Stream is the default edge transport for Vision Agents. The getstream plugin connects your agent to a Stream Video call over WebRTC and exposes Stream’s call platform — chat-backed conversation history, custom events, recording, transcription, broadcasting, and frontend SDKs for every major client — through the same EdgeTransport interface used by every other transport.

Vision Agents requires a Stream account for real-time transport. Stream offers 333,000 free participant minutes monthly, plus additional credits through the Maker Program for indie developers. Most AI providers also offer free tiers.

Why Stream Video RTC

Sub-500ms global latency. Agents connect through Stream’s edge network with PoPs worldwide — the same infrastructure that powers Stream Video for production telehealth, voice support, and live coaching apps.
The default in every example. All the LLM, STT, TTS, vision, and realtime guides in these docs use getstream.Edge(). Swap providers freely; the edge stays the same.
Audio + video + screen share. The plugin subscribes to audio, video, screen-share, and screen-share-audio tracks for every remote participant and re-publishes the agent’s own audio and video.
Chat-backed conversation history. StreamConversation mirrors the message history to a Stream Chat channel attached to the call, with markdown-aware chunking and ephemeral updates while the LLM is still generating — so your frontend can render transcripts and tool output in real time.
Custom events to every participant. Push arbitrary JSON to clients via send_custom_event(...) (payload capped at 5 KB by the platform) — useful for surfacing tool calls, UI hints, or telemetry.
Built-in demo helper. open_demo(call) provisions a guest user, joins them to the chat channel, mints a token, and opens Stream’s hosted demo UI in your browser. Handy before you wire up a real client.
Rich event surface. The plugin re-exports Stream’s call.* events — recording started/stopped, transcription ready, closed captions, HLS/RTMP broadcasting state, moderation actions, member updates, and more — so you can react to platform state from the agent’s event bus.
First-class frontend SDKs. Web, React, React Native, iOS, Android, Flutter, and Unity clients join the same call as your server-side agent.
Generous free tier. See Stream’s pricing for details.

Installation

uv add "vision-agents[getstream]"

Quick Start

Set STREAM_API_KEY and STREAM_API_SECRET from your Stream dashboard, then drop getstream.Edge() into your agent. The credentials are read by the underlying getstream Python client — Edge() takes no required arguments.

from dotenv import load_dotenv

from vision_agents.core import Agent, AgentLauncher, User, Runner
from vision_agents.plugins import getstream, gemini

load_dotenv()


async def create_agent(**kwargs) -> Agent:
    return Agent(
        edge=getstream.Edge(),
        agent_user=User(name="Assistant", id="agent"),
        instructions="You're a helpful voice assistant. Be concise.",
        llm=gemini.Realtime(),
    )


async def join_call(agent: Agent, call_type: str, call_id: str, **kwargs) -> None:
    call = await agent.create_call(call_type, call_id)
    async with agent.join(call):
        await agent.simple_response("Greet the user")
        await agent.finish()


if __name__ == "__main__":
    Runner(AgentLauncher(create_agent=create_agent, join_call=join_call)).cli()

Run with uv run main.py run. The CLI prints a join link you can open in any browser.

Environment Variables

Credentials and base URL are read by the underlying getstream client at construction time.

Variable	Default	Description
`STREAM_API_KEY`	—	API key from your Stream app dashboard. Required.
`STREAM_API_SECRET`	—	API secret used to mint server-side tokens. Required.
`STREAM_BASE_URL`	—	Override the Stream API base URL. Only set if instructed to by Stream support.

Conversation Persistence

When an agent joins a call, the plugin creates a messaging channel with the same ID as the call and wires it up as the agent’s Conversation. Every message the agent produces (and every user message it observes) is mirrored to that channel. The mirror does three things you don’t get with a plain in-memory conversation:

Markdown-aware chunking — long messages are split into ~1000-character pieces, preserving code-block boundaries so frontend renderers don’t break mid-fence.
Streaming updates — chunks are sent as ephemeral messages while the LLM is still generating, then finalized when generation completes. Clients see partial text update in real time.
Bidirectional history — anything posted to the channel from a client SDK is available to the agent for memory or RAG.

Custom Events

Push arbitrary JSON to every participant watching the call. Clients subscribe with call.on("custom", callback) in any frontend SDK. The payload is capped at 5 KB by the platform.

await agent.edge.send_custom_event({
    "type": "tool_result",
    "tool": "search_orders",
    "result": {"order_id": "1234", "status": "shipped"},
})

Opening a Demo

open_demo(call) creates a guest user, ensures it has access to the chat channel, mints a short-lived token, and opens Stream’s hosted demo UI pointed at your call. Useful while iterating locally before you wire up a real frontend.

async with agent.join(call):
    await agent.edge.open_demo(call)
    await agent.finish()

The base URL can be overridden via the EXAMPLE_BASE_URL environment variable.

Platform Events

The plugin registers every call.* event from Stream’s API as well as SFU-level participant and track events on the agent’s event bus. Subscribe to any of them with agent.events.subscribe(...):

from vision_agents.plugins.getstream import (
    CallRecordingStartedEvent,
    CallTranscriptionReadyEvent,
    CallSessionParticipantJoinedEvent,
)

@agent.events.subscribe
async def on_recording(event: CallRecordingStartedEvent):
    print("Recording started:", event)

Notable categories of events that are re-exported:

Participants & members — CallSessionParticipantJoined/Left, CallMemberAdded/Removed/Updated, CallSessionStarted/Ended.
Recording — CallRecordingStarted/Stopped/Ready/Failed, CallFrameRecordingStarted/Stopped/FrameReady/Failed.
Transcription & captions — CallTranscriptionStarted/Stopped/Ready/Failed, CallClosedCaptionsStarted/Stopped/Failed, ClosedCaptionEvent.
Broadcasting — CallHLSBroadcastingStarted/Stopped/Failed, CallRtmpBroadcastStarted/Stopped/Failed.
Lifecycle — CallCreated/Updated/Deleted/Ended, CallRing/Accepted/Rejected/Missed/Notification/Reaction.
Moderation & permissions — CallModerationBlur/Warning, BlockedUser/UnblockedUser/KickedUser, PermissionRequest, UpdatedCallPermissions, CallUserMuted.
Telemetry — CallStatsReportReady, CallUserFeedbackSubmitted.

See the Events reference for the full schema of each event type.

Frontend SDKs

Your users connect with a Stream Video frontend SDK while your agent runs server-side with this plugin — both join the same call.

Platform	Docs
Web (vanilla JS / TypeScript)	Stream Video Web
React	Stream Video React
React Native	Stream Video React Native
iOS (Swift)	Stream Video iOS
Android (Kotlin)	Stream Video Android
Flutter	Stream Video Flutter
Unity	Stream Video Unity

Next Steps

Build a Voice Agent

Pair the Stream edge with custom STT/LLM/TTS plugins.

Build a Video Agent

Add real-time video understanding with VLMs and YOLO.

Chat & Memory

Use the call’s Stream Chat channel for transcripts and tool surfaces.

Deploying Agents

Containerize and scale agents across Stream’s edge network.

Overview

Language Models

Realtime

Speech-to-Text

Text-to-Speech

Vision & Video

Avatars

Turn Detection

Infrastructure

Edge Transport

Custom Integrations

Why Stream Video RTC

Installation

Quick Start

Environment Variables

Conversation Persistence

Custom Events

Opening a Demo

Platform Events

Frontend SDKs

Next Steps

Build a Voice Agent

Build a Video Agent

Chat & Memory

Deploying Agents

Overview

Language Models

Realtime

Speech-to-Text

Text-to-Speech

Vision & Video

Avatars

Turn Detection

Infrastructure

Edge Transport

Custom Integrations

Documentation Index

​Why Stream Video RTC

​Installation

​Quick Start

​Environment Variables

​Conversation Persistence

​Custom Events

​Opening a Demo

​Platform Events

​Frontend SDKs

​Next Steps

Build a Voice Agent

Build a Video Agent

Chat & Memory

Deploying Agents

Why Stream Video RTC

Installation

Quick Start

Environment Variables

Conversation Persistence

Custom Events

Opening a Demo

Platform Events

Frontend SDKs

Next Steps