Qwen

Installation
Quick Start
Parameters
Next Steps

Qwen3 Realtime provides native audio I/O with built-in STT and TTS over WebSocket. No external speech services required.

Vision Agents requires a Stream account for real-time transport. Most providers offer free tiers to get started.

Installation

uv add "vision-agents[qwen]"

Quick Start

from vision_agents.core import Agent, User
from vision_agents.plugins import qwen, getstream

agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="Assistant", id="agent"),
    instructions="You are a helpful assistant.",
    llm=qwen.Realtime(fps=1),  # Enable video with fps > 0
)

Set DASHSCOPE_API_KEY in your environment.

Parameters

Name	Type	Default	Description
`model`	`str`	`"qwen3-omni-flash-realtime"`	Qwen Realtime model
`voice`	`str`	`"Cherry"`	Voice for audio output
`fps`	`int`	`1`	Video frames per second
`include_video`	`bool`	`False`	Include video frames
`vad_silence_duration_ms`	`int`	`900`	Silence before turn end
`api_key`	`str`	`None`	API key (defaults to `DASHSCOPE_API_KEY` env var)

Qwen Realtime does not support text input. Start speaking once you join the call.

Next Steps

Build a Voice Agent

Get started with voice

Build a Video Agent

Add video processing

Pocket TTS Roboflow

⌘I

Overview

AI Providers

Custom Integrations

Installation

Quick Start

Parameters

Next Steps

Build a Voice Agent

Build a Video Agent

Overview

AI Providers

Custom Integrations

​Installation

​Quick Start

​Parameters

​Next Steps

Build a Voice Agent

Build a Video Agent

Installation

Quick Start

Parameters

Next Steps