Deepgram provides fast, accurate real-time speech-to-text with built-in turn detection. Ideal for conversational agents.
Vision Agents requires a Stream account
for real-time transport. Most providers offer free tiers to get started.
Deepgram also provides low-latency text-to-speech. You can use both in the same agent.
Installation
uv add "vision-agents[deepgram]"
Quick Start
from vision_agents.core import Agent, User
from vision_agents.plugins import deepgram, gemini, getstream
agent = Agent(
edge=getstream.Edge(),
agent_user=User(name="Assistant", id="agent"),
instructions="You are a helpful assistant.",
llm=gemini.LLM("gemini-3-flash-preview"),
stt=deepgram.STT(),
tts=deepgram.TTS(),
)
Set DEEPGRAM_API_KEY in your environment or pass api_key directly.
Parameters
stt = deepgram.STT(
model="nova-3",
language="en",
eager_turn_detection=True,
)
| Name | Type | Default | Description |
|---|
model | str | "nova-3" | Deepgram model |
language | str | "en" | Language code |
eager_turn_detection | bool | False | Enable faster turn detection |
api_key | str | None | API key (defaults to DEEPGRAM_API_KEY env var) |
Next Steps
Deepgram TTS
Low-latency text-to-speech
Build a Voice Agent
Get started with voice