> ## Documentation Index
> Fetch the complete documentation index at: https://visionagents.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Stream Video RTC

[Stream](https://getstream.io/video/) is the default edge transport for Vision Agents. The `getstream` plugin connects your agent to a Stream Video call over WebRTC and exposes Stream's call platform — chat-backed conversation history, custom events, recording, transcription, broadcasting, and frontend SDKs for every major client — through the same `EdgeTransport` interface used by every other transport.

<Info>
  Vision Agents requires a [Stream](https://getstream.io/try-for-free/) account for real-time transport. Stream offers 333,000 free participant minutes monthly, plus additional credits through the [Maker Program](https://getstream.io/chat/pricing/#free-for-maker) for indie developers. Most AI providers also offer free tiers.
</Info>

## Why Stream Video RTC

* **Sub-500ms global latency.** Agents connect through Stream's edge network with PoPs worldwide — the same infrastructure that powers Stream Video for production telehealth, voice support, and live coaching apps.
* **The default in every example.** All the LLM, STT, TTS, vision, and realtime guides in these docs use `getstream.Edge()`. Swap providers freely; the edge stays the same.
* **Audio + video + screen share.** The plugin subscribes to audio, video, screen-share, and screen-share-audio tracks for every remote participant and re-publishes the agent's own audio and video.
* **Chat-backed conversation history.** `StreamConversation` mirrors the message history to a Stream Chat channel attached to the call, with markdown-aware chunking and ephemeral updates while the LLM is still generating — so your frontend can render transcripts and tool output in real time.
* **Custom events to every participant.** Push arbitrary JSON to clients via `send_custom_event(...)` (payload capped at 5 KB by the platform) — useful for surfacing tool calls, UI hints, or telemetry.
* **Built-in demo helper.** `open_demo(call)` provisions a guest user, joins them to the chat channel, mints a token, and opens [Stream's hosted demo UI](https://getstream.io/video/demos) in your browser. Handy before you wire up a real client.
* **Rich event surface.** The plugin re-exports Stream's `call.*` events — recording started/stopped, transcription ready, closed captions, HLS/RTMP broadcasting state, moderation actions, member updates, and more — so you can react to platform state from the agent's event bus.
* **First-class frontend SDKs.** Web, React, React Native, iOS, Android, Flutter, and Unity clients join the same call as your server-side agent.
* **Generous free tier.** See Stream's [pricing](https://getstream.io/video/pricing/) for details.

## Installation

```sh theme={null}
uv add "vision-agents[getstream]"
```

## Quick Start

Set `STREAM_API_KEY` and `STREAM_API_SECRET` from your [Stream dashboard](https://getstream.io/try-for-free/), then drop `getstream.Edge()` into your agent. The credentials are read by the underlying [`getstream`](https://pypi.org/project/getstream/) Python client — `Edge()` takes no required arguments.

```python theme={null}
from dotenv import load_dotenv

from vision_agents.core import Agent, AgentLauncher, User, Runner
from vision_agents.plugins import getstream, gemini

load_dotenv()


async def create_agent(**kwargs) -> Agent:
    return Agent(
        edge=getstream.Edge(),
        agent_user=User(name="Assistant", id="agent"),
        instructions="You're a helpful voice assistant. Be concise.",
        llm=gemini.Realtime(),
    )


async def join_call(agent: Agent, call_type: str, call_id: str, **kwargs) -> None:
    call = await agent.create_call(call_type, call_id)
    async with agent.join(call):
        await agent.simple_response("Greet the user")
        await agent.finish()


if __name__ == "__main__":
    Runner(AgentLauncher(create_agent=create_agent, join_call=join_call)).cli()
```

Run with `uv run main.py run`. The CLI prints a join link you can open in any browser.

## Environment Variables

Credentials and base URL are read by the underlying `getstream` client at construction time.

| Variable            | Default | Description                                                                    |
| ------------------- | ------- | ------------------------------------------------------------------------------ |
| `STREAM_API_KEY`    | —       | API key from your Stream app dashboard. Required.                              |
| `STREAM_API_SECRET` | —       | API secret used to mint server-side tokens. Required.                          |
| `STREAM_BASE_URL`   | —       | Override the Stream API base URL. Only set if instructed to by Stream support. |

## Conversation Persistence

When an agent joins a call, the plugin creates a `messaging` channel with the same ID as the call and wires it up as the agent's [`Conversation`](/guides/chat-and-memory). Every message the agent produces (and every user message it observes) is mirrored to that channel.

The mirror does three things you don't get with a plain in-memory conversation:

* **Markdown-aware chunking** — long messages are split into \~1000-character pieces, preserving code-block boundaries so frontend renderers don't break mid-fence.
* **Streaming updates** — chunks are sent as ephemeral messages while the LLM is still generating, then finalized when generation completes. Clients see partial text update in real time.
* **Bidirectional history** — anything posted to the channel from a client SDK is available to the agent for memory or RAG.

## Custom Events

Push arbitrary JSON to every participant watching the call. Clients subscribe with `call.on("custom", callback)` in any frontend SDK. The payload is capped at 5 KB by the platform.

```python theme={null}
await agent.edge.send_custom_event({
    "type": "tool_result",
    "tool": "search_orders",
    "result": {"order_id": "1234", "status": "shipped"},
})
```

## Opening a Demo

`open_demo(call)` creates a guest user, ensures it has access to the chat channel, mints a short-lived token, and opens [Stream's hosted demo UI](https://getstream.io/video/demos) pointed at your call. Useful while iterating locally before you wire up a real frontend.

```python theme={null}
async with agent.join(call):
    await agent.edge.open_demo(call)
    await agent.finish()
```

The base URL can be overridden via the `EXAMPLE_BASE_URL` environment variable.

## Platform Events

The plugin registers every `call.*` event from Stream's API as well as SFU-level participant and track events on the agent's event bus. Subscribe to any of them with `agent.events.subscribe(...)`:

```python theme={null}
from vision_agents.plugins.getstream import (
    CallRecordingStartedEvent,
    CallTranscriptionReadyEvent,
    CallSessionParticipantJoinedEvent,
)

@agent.events.subscribe
async def on_recording(event: CallRecordingStartedEvent):
    print("Recording started:", event)
```

Notable categories of events that are re-exported:

* **Participants & members** — `CallSessionParticipantJoined/Left`, `CallMemberAdded/Removed/Updated`, `CallSessionStarted/Ended`.
* **Recording** — `CallRecordingStarted/Stopped/Ready/Failed`, `CallFrameRecordingStarted/Stopped/FrameReady/Failed`.
* **Transcription & captions** — `CallTranscriptionStarted/Stopped/Ready/Failed`, `CallClosedCaptionsStarted/Stopped/Failed`, `ClosedCaptionEvent`.
* **Broadcasting** — `CallHLSBroadcastingStarted/Stopped/Failed`, `CallRtmpBroadcastStarted/Stopped/Failed`.
* **Lifecycle** — `CallCreated/Updated/Deleted/Ended`, `CallRing/Accepted/Rejected/Missed/Notification/Reaction`.
* **Moderation & permissions** — `CallModerationBlur/Warning`, `BlockedUser/UnblockedUser/KickedUser`, `PermissionRequest`, `UpdatedCallPermissions`, `CallUserMuted`.
* **Telemetry** — `CallStatsReportReady`, `CallUserFeedbackSubmitted`.

See the [Events reference](/reference/events-reference) for the full schema of each event type.

## Frontend SDKs

Your users connect with a Stream Video frontend SDK while your agent runs server-side with this plugin — both join the same call.

| Platform                      | Docs                                                                      |
| ----------------------------- | ------------------------------------------------------------------------- |
| Web (vanilla JS / TypeScript) | [Stream Video Web](https://getstream.io/video/docs/api/)                  |
| React                         | [Stream Video React](https://getstream.io/video/docs/react/)              |
| React Native                  | [Stream Video React Native](https://getstream.io/video/docs/reactnative/) |
| iOS (Swift)                   | [Stream Video iOS](https://getstream.io/video/docs/ios/)                  |
| Android (Kotlin)              | [Stream Video Android](https://getstream.io/video/docs/android/)          |
| Flutter                       | [Stream Video Flutter](https://getstream.io/video/docs/flutter/)          |
| Unity                         | [Stream Video Unity](https://getstream.io/video/docs/unity/)              |

## Next Steps

<CardGroup cols={2}>
  <Card title="Build a Voice Agent" icon="microphone" href="/introduction/voice-agents">
    Pair the Stream edge with custom STT/LLM/TTS plugins.
  </Card>

  <Card title="Build a Video Agent" icon="video" href="/introduction/video-agents">
    Add real-time video understanding with VLMs and YOLO.
  </Card>

  <Card title="Chat & Memory" icon="comments" href="/guides/chat-and-memory">
    Use the call's Stream Chat channel for transcripts and tool surfaces.
  </Card>

  <Card title="Deploying Agents" icon="rocket" href="/guides/deploying-overview">
    Containerize and scale agents across Stream's edge network.
  </Card>
</CardGroup>
