Skip to main content

Documentation Index

Fetch the complete documentation index at: https://visionagents.ai/llms.txt

Use this file to discover all available pages before exploring further.

Stream is the default edge transport for Vision Agents. The getstream plugin connects your agent to a Stream Video call over WebRTC and exposes Stream’s call platform — chat-backed conversation history, custom events, recording, transcription, broadcasting, and frontend SDKs for every major client — through the same EdgeTransport interface used by every other transport.
Vision Agents requires a Stream account for real-time transport. Stream offers 333,000 free participant minutes monthly, plus additional credits through the Maker Program for indie developers. Most AI providers also offer free tiers.

Why Stream Video RTC

  • Sub-500ms global latency. Agents connect through Stream’s edge network with PoPs worldwide — the same infrastructure that powers Stream Video for production telehealth, voice support, and live coaching apps.
  • The default in every example. All the LLM, STT, TTS, vision, and realtime guides in these docs use getstream.Edge(). Swap providers freely; the edge stays the same.
  • Audio + video + screen share. The plugin subscribes to audio, video, screen-share, and screen-share-audio tracks for every remote participant and re-publishes the agent’s own audio and video.
  • Chat-backed conversation history. StreamConversation mirrors the message history to a Stream Chat channel attached to the call, with markdown-aware chunking and ephemeral updates while the LLM is still generating — so your frontend can render transcripts and tool output in real time.
  • Custom events to every participant. Push arbitrary JSON to clients via send_custom_event(...) (payload capped at 5 KB by the platform) — useful for surfacing tool calls, UI hints, or telemetry.
  • Built-in demo helper. open_demo(call) provisions a guest user, joins them to the chat channel, mints a token, and opens Stream’s hosted demo UI in your browser. Handy before you wire up a real client.
  • Rich event surface. The plugin re-exports Stream’s call.* events — recording started/stopped, transcription ready, closed captions, HLS/RTMP broadcasting state, moderation actions, member updates, and more — so you can react to platform state from the agent’s event bus.
  • First-class frontend SDKs. Web, React, React Native, iOS, Android, Flutter, and Unity clients join the same call as your server-side agent.
  • Generous free tier. See Stream’s pricing for details.

Installation

uv add "vision-agents[getstream]"

Quick Start

Set STREAM_API_KEY and STREAM_API_SECRET from your Stream dashboard, then drop getstream.Edge() into your agent. The credentials are read by the underlying getstream Python client — Edge() takes no required arguments.
from dotenv import load_dotenv

from vision_agents.core import Agent, AgentLauncher, User, Runner
from vision_agents.plugins import getstream, gemini

load_dotenv()


async def create_agent(**kwargs) -> Agent:
    return Agent(
        edge=getstream.Edge(),
        agent_user=User(name="Assistant", id="agent"),
        instructions="You're a helpful voice assistant. Be concise.",
        llm=gemini.Realtime(),
    )


async def join_call(agent: Agent, call_type: str, call_id: str, **kwargs) -> None:
    call = await agent.create_call(call_type, call_id)
    async with agent.join(call):
        await agent.simple_response("Greet the user")
        await agent.finish()


if __name__ == "__main__":
    Runner(AgentLauncher(create_agent=create_agent, join_call=join_call)).cli()
Run with uv run main.py run. The CLI prints a join link you can open in any browser.

Environment Variables

Credentials and base URL are read by the underlying getstream client at construction time.
VariableDefaultDescription
STREAM_API_KEYAPI key from your Stream app dashboard. Required.
STREAM_API_SECRETAPI secret used to mint server-side tokens. Required.
STREAM_BASE_URLOverride the Stream API base URL. Only set if instructed to by Stream support.

Conversation Persistence

When an agent joins a call, the plugin creates a messaging channel with the same ID as the call and wires it up as the agent’s Conversation. Every message the agent produces (and every user message it observes) is mirrored to that channel. The mirror does three things you don’t get with a plain in-memory conversation:
  • Markdown-aware chunking — long messages are split into ~1000-character pieces, preserving code-block boundaries so frontend renderers don’t break mid-fence.
  • Streaming updates — chunks are sent as ephemeral messages while the LLM is still generating, then finalized when generation completes. Clients see partial text update in real time.
  • Bidirectional history — anything posted to the channel from a client SDK is available to the agent for memory or RAG.

Custom Events

Push arbitrary JSON to every participant watching the call. Clients subscribe with call.on("custom", callback) in any frontend SDK. The payload is capped at 5 KB by the platform.
await agent.edge.send_custom_event({
    "type": "tool_result",
    "tool": "search_orders",
    "result": {"order_id": "1234", "status": "shipped"},
})

Opening a Demo

open_demo(call) creates a guest user, ensures it has access to the chat channel, mints a short-lived token, and opens Stream’s hosted demo UI pointed at your call. Useful while iterating locally before you wire up a real frontend.
async with agent.join(call):
    await agent.edge.open_demo(call)
    await agent.finish()
The base URL can be overridden via the EXAMPLE_BASE_URL environment variable.

Platform Events

The plugin registers every call.* event from Stream’s API as well as SFU-level participant and track events on the agent’s event bus. Subscribe to any of them with agent.events.subscribe(...):
from vision_agents.plugins.getstream import (
    CallRecordingStartedEvent,
    CallTranscriptionReadyEvent,
    CallSessionParticipantJoinedEvent,
)

@agent.events.subscribe
async def on_recording(event: CallRecordingStartedEvent):
    print("Recording started:", event)
Notable categories of events that are re-exported:
  • Participants & membersCallSessionParticipantJoined/Left, CallMemberAdded/Removed/Updated, CallSessionStarted/Ended.
  • RecordingCallRecordingStarted/Stopped/Ready/Failed, CallFrameRecordingStarted/Stopped/FrameReady/Failed.
  • Transcription & captionsCallTranscriptionStarted/Stopped/Ready/Failed, CallClosedCaptionsStarted/Stopped/Failed, ClosedCaptionEvent.
  • BroadcastingCallHLSBroadcastingStarted/Stopped/Failed, CallRtmpBroadcastStarted/Stopped/Failed.
  • LifecycleCallCreated/Updated/Deleted/Ended, CallRing/Accepted/Rejected/Missed/Notification/Reaction.
  • Moderation & permissionsCallModerationBlur/Warning, BlockedUser/UnblockedUser/KickedUser, PermissionRequest, UpdatedCallPermissions, CallUserMuted.
  • TelemetryCallStatsReportReady, CallUserFeedbackSubmitted.
See the Events reference for the full schema of each event type.

Frontend SDKs

Your users connect with a Stream Video frontend SDK while your agent runs server-side with this plugin — both join the same call.
PlatformDocs
Web (vanilla JS / TypeScript)Stream Video Web
ReactStream Video React
React NativeStream Video React Native
iOS (Swift)Stream Video iOS
Android (Kotlin)Stream Video Android
FlutterStream Video Flutter
UnityStream Video Unity

Next Steps

Build a Voice Agent

Pair the Stream edge with custom STT/LLM/TTS plugins.

Build a Video Agent

Add real-time video understanding with VLMs and YOLO.

Chat & Memory

Use the call’s Stream Chat channel for transcripts and tool surfaces.

Deploying Agents

Containerize and scale agents across Stream’s edge network.