Skip to main content
Qwen3 Realtime provides native audio I/O with built-in STT and TTS over WebSocket. No external speech services required.
Vision Agents requires a Stream account for real-time transport. Most providers offer free tiers to get started.

Installation

uv add vision-agents[qwen]

Quick Start

from vision_agents.core import Agent, User
from vision_agents.plugins import qwen, getstream

agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="Assistant", id="agent"),
    instructions="You are a helpful assistant.",
    llm=qwen.Realtime(fps=1),  # Enable video with fps > 0
)
Set DASHSCOPE_API_KEY in your environment.

Parameters

NameTypeDefaultDescription
modelstr"qwen3-omni-flash-realtime"Qwen Realtime model
voicestr"Cherry"Voice for audio output
fpsint1Video frames per second
include_videoboolFalseInclude video frames
vad_silence_duration_msint900Silence before turn end
api_keystrNoneAPI key (defaults to DASHSCOPE_API_KEY env var)
Qwen Realtime does not support text input. Start speaking once you join the call.

Next Steps