STT streams PCM audio to Cartesia Ink and emits transcript and turn events that Vision Agents uses for interruption and eager turn handling.
Vision Agents requires a Stream account
for real-time transport. Most providers offer free tiers to get started.
Installation
Quick Start
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
model | str | "ink-2" | Cartesia STT model |
sample_rate | int | 16000 | PCM sample rate (Hz) sent to Cartesia |
encoding | str | "pcm_s16le" | PCM encoding sent to Cartesia |
cartesia_version | str | "2026-03-01" | Cartesia API version used for the turn-detection websocket |
api_key | str | None | API key (defaults to CARTESIA_API_KEY env var) |
Next Steps
Cartesia TTS
Low-latency text-to-speech
Build a Voice Agent
Get started with voice

