Skip to main content
Deepgram provides fast, accurate real-time speech-to-text with built-in turn detection. Ideal for conversational agents.
Vision Agents requires a Stream account for real-time transport. Most providers offer free tiers to get started.
Deepgram also provides low-latency text-to-speech. You can use both in the same agent.

Installation

uv add "vision-agents[deepgram]"

Quick Start

from vision_agents.core import Agent, User
from vision_agents.plugins import deepgram, gemini, getstream

agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="Assistant", id="agent"),
    instructions="You are a helpful assistant.",
    llm=gemini.LLM("gemini-3-flash-preview"),
    stt=deepgram.STT(),
    tts=deepgram.TTS(),
)
Set DEEPGRAM_API_KEY in your environment or pass api_key directly.

Parameters

stt = deepgram.STT(
    model="nova-3",
    language="en",
    eager_turn_detection=True,
)
NameTypeDefaultDescription
modelstr"nova-3"Deepgram model
languagestr"en"Language code
eager_turn_detectionboolFalseEnable faster turn detection
api_keystrNoneAPI key (defaults to DEEPGRAM_API_KEY env var)

Next Steps

Deepgram TTS

Low-latency text-to-speech

Build a Voice Agent

Get started with voice