Cartesia STT

Cartesia provides low-latency speech-to-text with the Ink model. STT streams PCM audio to Cartesia Ink and emits transcript and turn events that Vision Agents uses for interruption and eager turn handling.

Vision Agents uses Stream Video for real-time WebRTC transport by default. External WebRTC transports are supported as well. Most AI providers offer free tiers to get started.

Cartesia also provides low-latency text-to-speech. You can use both in the same agent.

Installation

uv add "vision-agents[cartesia]"

Quick Start

from vision_agents.core import Agent, User
from vision_agents.plugins import cartesia, gemini, getstream

agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="Assistant", id="agent"),
    instructions="You are a helpful assistant.",
    llm=gemini.LLM("gemini-3-flash-preview"),
    stt=cartesia.STT(),
    tts=cartesia.TTS(),
)

Set CARTESIA_API_KEY in your environment or pass api_key directly.

Parameters

Name	Type	Default	Description
`api_key`	`str`	`None`	API key (defaults to `CARTESIA_API_KEY` env var)
`model`	`str`	`"ink-2"`	Cartesia STT model
`sample_rate`	`int`	`16000`	PCM sample rate (Hz) sent to Cartesia. One of `8000`, `16000`, `22050`, `24000`, `44100`, `48000`
`encoding`	`str`	`"pcm_s16le"`	PCM encoding sent to Cartesia
`cartesia_version`	`str`	`"2026-03-01"`	Cartesia API version used for the turn-detection websocket
`websocket_url`	`str`	`"wss://api.cartesia.ai/stt/turns/websocket"`	WebSocket endpoint (mainly useful for tests)
`audio_chunk_duration_ms`	`int`	`100`	Maximum duration per WebSocket audio frame (ms)

Next Steps

Cartesia TTS

Low-latency text-to-speech

Build a Voice Agent

Get started with voice

AssemblyAI Deepgram STT

⌘I

Overview

Language Models

Realtime

Speech-to-Text

Text-to-Speech

Vision & Video

Avatars

Turn Detection

Infrastructure

Edge Transport

Telephony

Custom Integrations

Installation

Quick Start

Parameters

Next Steps

Cartesia TTS

Build a Voice Agent

​Installation

​Quick Start

​Parameters

​Next Steps

Cartesia TTS

Build a Voice Agent

Installation

Quick Start

Parameters

Next Steps