Vision Agents requires a Stream account for real-time transport. Most providers offer free tiers to get started.
Installation
Quick Start
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
model | str | "qwen3-omni-flash-realtime" | Qwen Realtime model |
voice | str | "Cherry" | Voice for audio output |
fps | int | 1 | Video frames per second |
include_video | bool | False | Include video frames |
vad_silence_duration_ms | int | 900 | Silence before turn end |
api_key | str | None | API key (defaults to DASHSCOPE_API_KEY env var) |
Qwen Realtime does not support text input. Start speaking once you join the call.

