Vision Agents requires a Stream account
for real-time transport. Most providers offer free tiers to get started.
Installation
Quick start
STT
Real-time transcription using AssemblyAI’s Universal-3 Pro model with built-in turn detection.With keyterms boosting
Boost recognition accuracy for specific terms:With custom turn silence thresholds
Configure turn detection timing:Parameters
| Name | Type | Default | Description |
|---|---|---|---|
api_key | str | None | API key (defaults to ASSEMBLYAI_API_KEY env var) |
speech_model | str | "u3-rt-pro" | Model identifier |
sample_rate | int | 16000 | Audio sample rate in Hz |
min_turn_silence | int | API default | Silence (ms) before speculative end-of-turn check |
max_turn_silence | int | API default | Maximum silence (ms) before forcing turn end |
prompt | str | None | Custom transcription prompt (cannot be combined with keyterms_prompt) |
keyterms_prompt | list[str] | None | List of terms to boost recognition for (cannot be combined with prompt) |
max_reconnect_attempts | int | 3 | Maximum reconnect attempts on transient failures |
reconnect_backoff_initial_s | float | 0.5 | Initial backoff delay in seconds |
reconnect_backoff_max_s | float | 4.0 | Maximum backoff delay in seconds |

