Vision Agents uses Stream Video for real-time WebRTC transport by default. External WebRTC transports are supported as well. Most AI providers offer free tiers to get started.
Anam provides API keys and avatar IDs through their dashboard.
Installation
Quick Start
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
avatar_id | str | None | Anam avatar ID (defaults to ANAM_AVATAR_ID env var) |
api_key | str | None | API key (defaults to ANAM_API_KEY env var) |
client_options | ClientOptions | None | Advanced Anam client configuration |
connect_timeout | float | None | Seconds to wait for connection (None = wait indefinitely) |
session_ready_timeout | float | None | Seconds to wait for session ready (None = wait indefinitely) |
width | int | 720 | Output video width in pixels |
height | int | 480 | Output video height in pixels |
fps | int | 30 | Output video frame rate. Must be > 0. |
buffer_seconds | float | 1.0 | Max video buffer depth in seconds ahead of audio playback. Must be > 0. |
How It Works
- Agent TTS audio is resampled to 24 kHz mono and streamed to Anam
- Anam generates lip-synced avatar video and audio from the input
- Avatar video and audio frames are streamed back to call participants via Stream Edge
- When a user starts speaking, the avatar is automatically interrupted
Next Steps
Build a Voice Agent
Get started with voice
Build a Video Agent
Add video processing
Build Your Own Avatar
Subclass the
Avatar base class