Deepgram provides low-latency text-to-speech synthesis with the Aura-2 model.
Vision Agents requires a Stream account
for real-time transport. Most providers offer free tiers to get started.
Deepgram also provides fast, accurate speech-to-text with built-in turn detection. You can use both in the same agent.
Installation
uv add "vision-agents[deepgram]"
Quick Start
from vision_agents.core import Agent, User
from vision_agents.plugins import deepgram, gemini, getstream
agent = Agent(
edge=getstream.Edge(),
agent_user=User(name="Assistant", id="agent"),
instructions="You are a helpful assistant.",
llm=gemini.LLM("gemini-3-flash-preview"),
stt=deepgram.STT(),
tts=deepgram.TTS(),
)
Set DEEPGRAM_API_KEY in your environment or pass api_key directly.
Parameters
tts = deepgram.TTS(
model="aura-2",
voice="aura-asteria-en",
)
| Name | Type | Default | Description |
|---|
model | str | "aura-2" | TTS model |
voice | str | "aura-asteria-en" | Voice ID (available voices) |
api_key | str | None | API key (defaults to DEEPGRAM_API_KEY env var) |
Next Steps
Deepgram STT
Real-time speech-to-text
Build a Voice Agent
Get started with voice