AssemblyAI

AssemblyAI provides real-time streaming speech-to-text with built-in punctuation-based turn detection and sub-300ms latency.

Vision Agents requires a Stream account for real-time transport. Most providers offer free tiers to get started.

Installation

uv add "vision-agents[assemblyai]"

Quick start

from vision_agents.core import Agent, User
from vision_agents.plugins import assemblyai, cartesia, gemini, getstream

agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="Assistant", id="agent"),
    instructions="You are a helpful assistant.",
    llm=gemini.LLM(),
    stt=assemblyai.STT(),
    tts=cartesia.TTS(),
)

Set ASSEMBLYAI_API_KEY in your environment or pass api_key directly.

STT

Real-time transcription using AssemblyAI’s Universal-3 Pro model with built-in turn detection.

stt = assemblyai.STT(
    speech_model="u3-rt-pro",
    sample_rate=16000,
)

With keyterms boosting

Boost recognition accuracy for specific terms:

stt = assemblyai.STT(
    keyterms_prompt=["AssemblyAI", "Vision Agents"],
)

With custom turn silence thresholds

Configure turn detection timing:

stt = assemblyai.STT(
    min_turn_silence=100,   # ms before speculative EOT check
    max_turn_silence=1200,  # ms before forcing turn end
)

Parameters

Name	Type	Default	Description
`api_key`	`str`	`None`	API key (defaults to `ASSEMBLYAI_API_KEY` env var)
`speech_model`	`str`	`"u3-rt-pro"`	Model identifier
`sample_rate`	`int`	`16000`	Audio sample rate in Hz
`min_turn_silence`	`int`	API default	Silence (ms) before speculative end-of-turn check
`max_turn_silence`	`int`	API default	Maximum silence (ms) before forcing turn end
`prompt`	`str`	`None`	Custom transcription prompt (cannot be combined with `keyterms_prompt`)
`keyterms_prompt`	`list[str]`	`None`	List of terms to boost recognition for (cannot be combined with `prompt`)
`max_reconnect_attempts`	`int`	`3`	Maximum reconnect attempts on transient failures
`reconnect_backoff_initial_s`	`float`	`0.5`	Initial backoff delay in seconds
`reconnect_backoff_max_s`	`float`	`4.0`	Maximum backoff delay in seconds

Overview

AI Providers

Custom Integrations

Installation

Quick start

STT

With keyterms boosting

With custom turn silence thresholds

Parameters

Next steps

Build a Voice Agent

Build a Video Agent

Overview

AI Providers

Custom Integrations

​Installation

​Quick start

​STT

​With keyterms boosting

​With custom turn silence thresholds

​Parameters

​Next steps

Build a Voice Agent

Build a Video Agent

Installation

Quick start

STT

With keyterms boosting

With custom turn silence thresholds

Parameters

Next steps